Questions about software RAID

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Questions about software RAID
@ 2005-04-18 19:50 tmp
  2005-04-18 20:12 ` David Greaves
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: tmp @ 2005-04-18 19:50 UTC (permalink / raw)
  To: linux-raid

I read the software RAID-HOWTO, but the below 6 questions is still
unclear. I have asked around on IRC-channels and it seems that I am not
the only one being confused. Maybe the HOWTO could be updated to
clearify the below items?

1) I have a RAID-1 setup with one spare disk. A disk crashes and the
spare disk takes over. Now, when the crashed disk is replaced with a new
one, what is then happening with the role of the spare disk? Is it
reverting to its old role as spare disk?

If it is NOT reverting to it's old role, then the raidtab file will
suddenly be out-of-sync with reality. Is that correct?

Does the answer given here differ in e.g. RAID-5 setups?

2) The new disk has to be manually partitioned before beeing used in the
array. What happens if the new partitions are larger than other
partitions used in the array? What happens if they are smaller?

3) Must all partition types be 0xFD? What happens if they are not?

4) I guess the partitions itself doesn't have to be formated as the
filesystem is on the RAID-level. Is that correct?

5) Removing a disk requires that I do a "mdadm -r" on all the partitions
that is involved in a RAID array. I attempt to by a hot-swap capable
controler, so what happens if I just pull out the disk without this
manual removal command?
Aren't there some more hotswap-friendly setup?

6) I know that the kernel does stripping automatically if more
partitions are given as swap partitions in /etc/fstab. But can it also
handle if one disk crashes? I.e. do I have to let my swap disk be a
RAID-setup too if I wan't it to continue upon disk crash?

Thanks!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-18 19:50 Questions about software RAID tmp
@ 2005-04-18 20:12 ` David Greaves
  2005-04-18 23:12   ` tmp
  2005-04-18 20:15 ` Questions about software RAID Peter T. Breuer
  2005-04-18 20:50 ` Frank Wittig
  2 siblings, 1 reply; 26+ messages in thread
From: David Greaves @ 2005-04-18 20:12 UTC (permalink / raw)
  To: tmp; +Cc: linux-raid

tmp wrote:

>I read the software RAID-HOWTO, but the below 6 questions is still
>unclear. I have asked around on IRC-channels and it seems that I am not
>the only one being confused. Maybe the HOWTO could be updated to
>clearify the below items?
>
>
>1) I have a RAID-1 setup with one spare disk. A disk crashes and the
>spare disk takes over. Now, when the crashed disk is replaced with a new
>one, what is then happening with the role of the spare disk?
>
the new disk is spare, the array doesn't revert to it's original state.

> Is it
>reverting to its old role as spare disk?
>  
>
so no it doesn't.

>If it is NOT reverting to it's old role, then the raidtab file will
>suddenly be out-of-sync with reality. Is that correct?
>  
>
yes
raidtab is deprecated - man mdadm

>Does the answer given here differ in e.g. RAID-5 setups?
>  
>
no

>
>2) The new disk has to be manually partitioned before beeing used in the
>array.
>
no it doesn't. You could use the whole disk (/dev/hdb).
In general, AFAIK, partitions are better as they allow automatic 
assembly at boot.

> What happens if the new partitions are larger than other
>partitions used in the array?
>
nothing special - eventually, if you replace all the partitions with 
bigger ones you can 'grow' the array

> What happens if they are smaller?
>  
>
it won't work (doh!)

>
>3) Must all partition types be 0xFD? What happens if they are not?
>  
>
no
They won't be autodetected by the _kernel_

>
>4) I guess the partitions itself doesn't have to be formated as the
>filesystem is on the RAID-level. Is that correct?
>  
>
compulsory!

>
>5) Removing a disk requires that I do a "mdadm -r" on all the partitions
>that is involved in a RAID array. I attempt to by a hot-swap capable
>controler, so what happens if I just pull out the disk without this
>manual removal command?
>  
>
as far as md is concerned the disk disappeared.
I _think_ this is just like mdadm -r.

>Aren't there some more hotswap-friendly setup?
>  
>
What's unfriendly?

>
>6) I know that the kernel does stripping automatically if more
>partitions are given as swap partitions in /etc/fstab. But can it also
>handle if one disk crashes?
>
no - striping <> mirroring
The kernel will fail to read data on the crashed disk - game over.

> I.e. do I have to let my swap disk be a
>RAID-setup too if I wan't it to continue upon disk crash?
>  
>
yes - a mirror, not a stripe.


David


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-18 19:50 Questions about software RAID tmp
  2005-04-18 20:12 ` David Greaves
@ 2005-04-18 20:15 ` Peter T. Breuer
  2005-04-18 20:50 ` Frank Wittig
  2 siblings, 0 replies; 26+ messages in thread
From: Peter T. Breuer @ 2005-04-18 20:15 UTC (permalink / raw)
  To: linux-raid

tmp <skrald@amossen.dk> wrote:
> 1) I have a RAID-1 setup with one spare disk. A disk crashes and the
> spare disk takes over. Now, when the crashed disk is replaced with a new
> one, what is then happening with the role of the spare disk? Is it
> reverting to its old role as spare disk?

Try it and see.  Run raidsetfaulty on one disk.  That will bring the
spare in.  Run raidhotremove on the original.  Then "replace" it
with raidhotadd.

> If it is NOT reverting to it's old role, then the raidtab file will
> suddenly be out-of-sync with reality. Is that correct?

Shrug. It was "out of sync" as you call it the moment the spare disk
started to be used not as a spare but as part of the array.

> Does the answer given here differ in e.g. RAID-5 setups?

No.

> 2) The new disk has to be manually partitioned before beeing used in the
> array. What happens if the new partitions are larger than other
> partitions used in the array?

Bigger is fine, obviously!

> What happens if they are smaller?

They can't be used.

> 3) Must all partition types be 0xFD? What happens if they are not?

They can be anything you like. If they aren't, then the kernel
can't set them up at boot.

> 4) I guess the partitions itself doesn't have to be formated as the
> filesystem is on the RAID-level. Is that correct?

?? Sentence does not compute, I am afraid.

> 5) Removing a disk requires that I do a "mdadm -r" on all the partitions
> that is involved in a RAID array.

Does it? Well, I see that you mean "removing a disk intentionally".

> I attempt to by a hot-swap capable
> controler, so what happens if I just pull out the disk without this
> manual removal command?

The disk will error at the next access and will be faulted out of the
array.

> Aren't there some more hotswap-friendly setup?

?? Not sure what you mean. You mean, can you program the hotplug system
to do a setfaulty and remove from the array? Yes. Look at your hotplug
scripts in /etc/hotplug. But it' always going to be late hatever it
does, given that pulling the array is the trigger!

> 6) I know that the kernel does stripping automatically if more
> partitions are given as swap partitions in /etc/fstab. But can it also
> handle if one disk crashes? I.e. do I have to let my swap disk be a
> RAID-setup too if I wan't it to continue upon disk crash?

People have recently pointed out that raiding your swap makes sense
exactly in order to cope robustly with this eventually. You'd have had
to raid everything ELSE on the dead disk, of course, so I'm not quite
as sure as everyone else that it's a truly staggeringly wonderful idea. 

Peter

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-18 19:50 Questions about software RAID tmp
  2005-04-18 20:12 ` David Greaves
  2005-04-18 20:15 ` Questions about software RAID Peter T. Breuer
@ 2005-04-18 20:50 ` Frank Wittig
  2 siblings, 0 replies; 26+ messages in thread
From: Frank Wittig @ 2005-04-18 20:50 UTC (permalink / raw)
  To: tmp; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

tmp wrote:

>2) The new disk has to be manually partitioned before beeing used in the
>array. What happens if the new partitions are larger than other
>partitions used in the array? What happens if they are smaller?
>  
>
there's no problem to create partitions which have exactly the same size
as the old ones.
your disks can be from a different manufacturer, have differnt sizes,
different number of physical heads or anything else.

if you set up your disks to have the same geometry (heads, cylinders,
sectors) you can have partitions exactly the same size. (the extra size
of larger disks won't be lost.)

read "man fdisk" and have a look at the parameters -C -H and -S...

greetings,
frank

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-18 20:12 ` David Greaves
@ 2005-04-18 23:12   ` tmp
  2005-04-19  6:36     ` Peter T. Breuer
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: tmp @ 2005-04-18 23:12 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

Thanks for your answers! They led to a couple of new questions,
however. :-)

I've read "man mdadm" and "man mdadm.conf" but I certainly doesn't have
an overview of software RAID.

> yes
> raidtab is deprecated - man mdadm

OK. The HOWTO describes mostly a raidtools context, however. Is the
following correct then?
mdadm.conf may be considered as the replacement for raidtab. When mdadm
starts it consults this file and starts the raid arrays correspondingly.
This leads to the following:

a) If mdadm starts the arrays, how can I then boot from a RAID device
(mdadm isn't started upon boot)?
I don't quite get which parts of the RAID system are controled by the
kernel and which parts are controled by mdadm.

b) Whenever I replace disks, the runtime configuration changes. I assume
that I should manually edit mdadm.conf in order to make corespond to
reality?

> >2) The new disk has to be manually partitioned before beeing used in the
> >array.
> no it doesn't. You could use the whole disk (/dev/hdb).
> In general, AFAIK, partitions are better as they allow automatic 
> assembly at boot.

Is it correct that I can use whole disks (/dev/hdb) only if I make a
partitionable array and thus creates the partitions UPON the raid
mechanism?

As far as I can see, partitionable arrays makes disk replacements easier
as you can just replace the disk and let the RAID software take care of
syncing the new disk with existing partitioning. Is that correct?

You say I can't boot from such a partitionable raid array. Is that
correctly understood?

Can I "grow" a partitionable raid array if I replace the existing disks
with larger ones later? 

Would you prefer manual partitioned disks, even though disk replacements
are a bit more difficult?

I guess that mdadm automatically writes persistent superblocks to all
disks?

> >3) Must all partition types be 0xFD? What happens if they are not?
> no
> They won't be autodetected by the _kernel_

OK, so it is generally a good idea to always set the partition types to
0xFD, I guess.

> >4) I guess the partitions itself doesn't have to be formated as the
> >filesystem is on the RAID-level. Is that correct?
> compulsory!

I meant, the /dev/mdX has to be formatted, not the individual
partitions. Still right?

> >5) Removing a disk requires that I do a "mdadm -r" on all the partitions
> >that is involved in a RAID array. I attempt to by a hot-swap capable
> >controler, so what happens if I just pull out the disk without this
> >manual removal command?
> as far as md is concerned the disk disappeared.
> I _think_ this is just like mdadm -r.

So I could actually just pull out the disk, insert a new one and do a
"mdadm -a /dev/mdX /dev/sdY"?
The RAID system won't detect the newly inserted disk itself?

> > I.e. do I have to let my swap disk be a
> >RAID-setup too if I wan't it to continue upon disk crash?
> yes - a mirror, not a stripe.

OK. Depending on your recomendations above, I could either make it a
swap partition on a partitionable array or create an array for the swap
in the conventional way (of existing partitions).

Thanks again for your help!

Are there some HOWTO out there, that is up-to-date and is based on RAID
usage with mdadm and kernel 2.6 instead of raidtools and kernel 2.2/2.4?
I can't possibly be the only one with these newbie questions. :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-18 23:12   ` tmp
@ 2005-04-19  6:36     ` Peter T. Breuer
  2005-04-19  7:15     ` Luca Berra
  2005-04-19 12:08     ` Don't use whole disks for raid arrays [was: Questions about software RAID] Michael Tokarev
  2 siblings, 0 replies; 26+ messages in thread
From: Peter T. Breuer @ 2005-04-19  6:36 UTC (permalink / raw)
  To: linux-raid

tmp <skrald@amossen.dk> wrote:
> I've read "man mdadm" and "man mdadm.conf" but I certainly doesn't have
> an overview of software RAID.

Then try using it instead/as well as reading about it, and you will
obtain a more cmprehensive understanding.

> OK. The HOWTO describes mostly a raidtools context, however. Is the
> following correct then?
> mdadm.conf may be considered as the replacement for raidtab. When mdadm

No. Mdadm (generally speaking) does NOT use a configuration file and
that is perhaps its major difference wrt to raidtools.  Tt's command
line.  You can see for yourself what the man page itself summarises as
the differences (the one about not using a configuration file is #2 of
3):

    mdadm is a program that can be used to create, manage, and monitor
    MD devices.  As such it provides a similar set of functionality to
    the raidtools packages.  The key differ ences between mdadm and
    raidtools are:

       mdadm is a single program and not a collection of pro grams.

       mdadm can perform (almost) all of its functions with out having
       a configuration file and does not use one by default.  Also mdadm
       helps with management of the configuration file.

       mdadm can provide information about your arrays (through Query,
       Detail, and Examine) that raidtools cannot.

> starts it consults this file and starts the raid arrays correspondingly.

No. As far as I am aware, the config file contains such details of
existing raid arrays as may conveniently be discovered during a
physical scan, and as such cntains only redundant information that at
most may save the cost of a physical scan during such operations as may
require it.

Feel free to correct me!

> This leads to the following:

Then I'll ignore it :-).

> Is it correct that I can use whole disks (/dev/hdb) only if I make a
> partitionable array and thus creates the partitions UPON the raid
> mechanism?

Incomprehensible, I am afraid.  You can use either partitions or whole
disks in a raid array.

> As far as I can see, partitionable arrays makes disk replacements easier

Oh - you mean that the partitions can be recognized at bootup by the
kernel.

> You say I can't boot from such a partitionable raid array. Is that
> correctly understood?

Partitionable? Or partitioned? I'm not sure what you mean.

You would be able to boot via lilo from a partitioned RAID1 array, since
all lilo requires is a block map of here to read the kernel image from,
and either component of the RAID1 would do, and I'm sure that lilo has
been altered to allow the use of both/either components blockmap during
its startup routines.

I don't know if grub can boot from a RAID1 array but it strikes me as
likely since it would be able to ignore the raid1-ness and boot
successfully just as though it were a (pre-raid-aware) lilo.

> Can I "grow" a partitionable raid array if I replace the existing disks
> with larger ones later? 

Partitionable? Or partitioned? If you grew the array you would be
extending it beyond the last partition. The partition table itself is n
sector zero, so it is not affected. You would presumably next change
the partitions to take advatage of the increased size available.

> Would you prefer manual partitioned disks, even though disk replacements
> are a bit more difficult?

I don't understand.

> I guess that mdadm automatically writes persistent superblocks to all
> disks?

By default, yes?

> I meant, the /dev/mdX has to be formatted, not the individual
> partitions. Still right?

I'm not sure what you mean. You mean "/dev/mdXy" by "individual
partitions"?

> So I could actually just pull out the disk, insert a new one and do a
> "mdadm -a /dev/mdX /dev/sdY"?

You might want to check that the old has been removed as well as faulted
first. I would imagine it is "only" faulted. But it doesn't matter.

> The RAID system won't detect the newly inserted disk itself?

It obeys commands. You can program the hotplug system to add it in
autmatically.

> Are there some HOWTO out there, that is up-to-date and is based on RAID
> usage with mdadm and kernel 2.6 instead of raidtools and kernel 2.2/2.4?

What there is seems fine to me if you can use the mdadm equivalents
instead of raidhotadd and raidsetfaulty and raidhotremve and mkraid.
The config file is not needed.

Peter

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-18 23:12   ` tmp
  2005-04-19  6:36     ` Peter T. Breuer
@ 2005-04-19  7:15     ` Luca Berra
  2005-04-19  8:08       ` David Greaves
  2005-04-19 12:08     ` Don't use whole disks for raid arrays [was: Questions about software RAID] Michael Tokarev
  2 siblings, 1 reply; 26+ messages in thread
From: Luca Berra @ 2005-04-19  7:15 UTC (permalink / raw)
  To: linux-raid

On Tue, Apr 19, 2005 at 01:12:16AM +0200, tmp wrote:
>mdadm.conf may be considered as the replacement for raidtab. When mdadm
>starts it consults this file and starts the raid arrays correspondingly.
>This leads to the following:
yes, and no
mdadm does not need a configuration, but the config file helps.
check
http://cvs.mandrakesoft.com/cgi-bin/cvsweb.cgi/SPECS/mdadm/raidtabtomdadm.sh
for a script to convert from an existing raidtab to mdadm.conf

>a) If mdadm starts the arrays, how can I then boot from a RAID device
>(mdadm isn't started upon boot)?
>I don't quite get which parts of the RAID system are controled by the
>kernel and which parts are controled by mdadm.
the best choice is having an initrd containing mdassemble (part of
mdadm) and the configuration file.
the second to best choice is using the kernel command line to assemble
the raid array.
the last chance is using FD partition and in-kernel autodetect.

>b) Whenever I replace disks, the runtime configuration changes. I assume
>that I should manually edit mdadm.conf in order to make corespond to
>reality?
no, the mdadm configuration file only contains information on how to
identify the raid components, not their status. if you only use the UUID
to identify the array you will be able to find it whatever you do to it.

>> >2) The new disk has to be manually partitioned before beeing used in the
>> >array.
>> no it doesn't. You could use the whole disk (/dev/hdb).
>> In general, AFAIK, partitions are better as they allow automatic 
>> assembly at boot.
>
>Is it correct that I can use whole disks (/dev/hdb) only if I make a
>partitionable array and thus creates the partitions UPON the raid
>mechanism?
no, you can use a whole disk as a whole disk, there is no law that you
have to partition it. usually you do because it is easier to manage, but
you could use LVM instead of partitions.

>As far as I can see, partitionable arrays makes disk replacements easier
>as you can just replace the disk and let the RAID software take care of
>syncing the new disk with existing partitioning. Is that correct?
layering the partitions above the raid array is easier to manage.
>You say I can't boot from such a partitionable raid array. Is that
>correctly understood?
why not?

>Can I "grow" a partitionable raid array if I replace the existing disks
>with larger ones later? 
yes, you will have free (non partitioned) space at the end.

>Would you prefer manual partitioned disks, even though disk replacements
>are a bit more difficult?
YMMV

>I guess that mdadm automatically writes persistent superblocks to all
>disks?
unless you tell it not to, when creating an array with mdadm it writes a
persistent superblock.

>> >3) Must all partition types be 0xFD? What happens if they are not?
>> no
>> They won't be autodetected by the _kernel_
>
>OK, so it is generally a good idea to always set the partition types to
>0xFD, I guess.
many people find it easier to understand if raid partitions are set to
0XFD. kernel autodetection is broken and should not be relied upon.

>> >4) I guess the partitions itself doesn't have to be formated as the
>> >filesystem is on the RAID-level. Is that correct?
>> compulsory!
>
>I meant, the /dev/mdX has to be formatted, not the individual
>partitions. Still right?
compulsory! if you do anything on the individual components you'll damage data.

>> >5) Removing a disk requires that I do a "mdadm -r" on all the partitions
>> >that is involved in a RAID array. I attempt to by a hot-swap capable
>> >controler, so what happens if I just pull out the disk without this
>> >manual removal command?
>> as far as md is concerned the disk disappeared.
>> I _think_ this is just like mdadm -r.
i think it will be marked faulty, not removed.

>So I could actually just pull out the disk, insert a new one and do a
>"mdadm -a /dev/mdX /dev/sdY"?
>The RAID system won't detect the newly inserted disk itself?
no, think of it as flexibility. if you want you can build something
using the "hotplug" subsystem.

...
>Are there some HOWTO out there, that is up-to-date and is based on RAID
>usage with mdadm and kernel 2.6 instead of raidtools and kernel 2.2/2.4?
>I can't possibly be the only one with these newbie questions. :-)
one last word:
never trust howtos (they should be called howidid), they have the
tendency to apply to the author configuration, not yours.
general documentation is far more accurate.

L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-19  7:15     ` Luca Berra
@ 2005-04-19  8:08       ` David Greaves
  2005-04-19 12:18         ` Michael Tokarev
  0 siblings, 1 reply; 26+ messages in thread
From: David Greaves @ 2005-04-19  8:08 UTC (permalink / raw)
  To: linux-raid

Luca Berra wrote:

> many people find it easier to understand if raid partitions are set to
> 0XFD. kernel autodetection is broken and should not be relied upon.

Could you clarify what is broken?
I understood that it was simplistic (ie if you have a raid0 built over a 
raid5
or something exotic then it may have problems) but essentially worked.
Could it be :
* broken for complex raid on raid
* broken for root devices
* fine for 'simple', non-root devices

>
>
>>> >4) I guess the partitions itself doesn't have to be formated as the
>>> >filesystem is on the RAID-level. Is that correct?
>>> compulsory!
>>
>>
>> I meant, the /dev/mdX has to be formatted, not the individual
>> partitions. Still right?
>
> compulsory! if you do anything on the individual components you'll 
> damage data.
>
>>> >5) Removing a disk requires that I do a "mdadm -r" on all the 
>>> partitions
>>> >that is involved in a RAID array. I attempt to by a hot-swap capable
>>> >controler, so what happens if I just pull out the disk without this
>>> >manual removal command?
>>> as far as md is concerned the disk disappeared.
>>> I _think_ this is just like mdadm -r.
>>
> i think it will be marked faulty, not removed.

yep - you're right, I remember now.
You have to mdadm -r remove it and re-add it once you restore the disk.

>
>> So I could actually just pull out the disk, insert a new one and do a
>> "mdadm -a /dev/mdX /dev/sdY"?
>> The RAID system won't detect the newly inserted disk itself?
>
> no, think of it as flexibility. if you want you can build something
> using the "hotplug" subsystem.

or:
no, it would be mighty strange if the raid subsystem just grabbed every
new disk it saw...
Think of what would happen when I insert my camera's compact flash card
and it suddenly gets used as a hot spare <grin>



I'll leave Luca's last word - although it's also worth re-reading Peter's
first words!!

David

> one last word:
> never trust howtos (they should be called howidid), they have the
> tendency to apply to the author configuration, not yours.
> general documentation is far more accurate.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
@ 2005-04-19 11:00 bernd
  2005-04-19 14:40 ` Hervé Eychenne
  0 siblings, 1 reply; 26+ messages in thread
From: bernd @ 2005-04-19 11:00 UTC (permalink / raw)
  To: linux-raid

>Devid wrote:
>>>>
>>>> >5) Removing a disk requires that I do a "mdadm -r" on all the 
>>>> partitions
>>>> >that is involved in a RAID array. I attempt to by a hot-swap capable
>>>> >controler, so what happens if I just pull out the disk without this
>>>> >manual removal command?
>>>> as far as md is concerned the disk disappeared.
>>>> I _think_ this is just like mdadm -r.
>>
>> i think it will be marked faulty, not removed.
>>
>yep - you're right, I remember now.
>You have to mdadm -r remove it and re-add it once you restore the disk.

First you have to look if there are partitions on that disk to which no
data was written since the disk failed (this typically concerns the swap
partition). These partitions have to be marked faulty by hand using mdadm -f
before you can remove them with mdadm -r. If you have scsi-disks you have 
to use the following command to take it out off the kernel after removing 
a faulty disk:

echo "scsi remove-single-device h.c.i.l" >/proc/scsi/scsi

>>> So I could actually just pull out the disk, insert a new one and do a
>>> "mdadm -a /dev/mdX /dev/sdY"?
>>> The RAID system won't detect the newly inserted disk itself?
>
>or:
>no, it would be mighty strange if the raid subsystem just grabbed every
>new disk it saw...
>Think of what would happen when I insert my camera's compact flash card
>and it suddenly gets used as a hot spare <grin>

But if the new disk contains any RAID information and partitions on it
then after spinning it up with something like 

echo "scsi add-single-device h.c.i.l" >/proc/scsi/scsi

the RAID system immediately tries to activate those incomming array(s).
We had this yesterday on a SuSE 9.3 system. So be carefull walking with
used disks from one system to another (this szenario is discussed actually
in a parallel thread under topic ... uuid...).

>> no, think of it as flexibility. if you want you can build something
>> using the "hotplug" subsystem.

We tried to build "something like a hotplug system" :-). Our hardware
supports this but in a ratio of 1:10 the kernel (actually 2.6.11-4) 
crashes when there is activity on that controller while spinning up the
new disk. We hoped the system would survive with the remaining (second)
controller and the part of the mirrors (RAID1) attached to it but it
fails in ca. 10% of our attempts. So till now we weren't lucky to build
up a system based on software-raid with no downtime in case of a disk 
failure. But may be this problem is more related to SCSI than to sw-raid...

Bernd Rieke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Don't use whole disks for raid arrays [was: Questions about software RAID]
  2005-04-18 23:12   ` tmp
  2005-04-19  6:36     ` Peter T. Breuer
  2005-04-19  7:15     ` Luca Berra
@ 2005-04-19 12:08     ` Michael Tokarev
  2 siblings, 0 replies; 26+ messages in thread
From: Michael Tokarev @ 2005-04-19 12:08 UTC (permalink / raw)
  To: tmp; +Cc: David Greaves, linux-raid

A followup about one single question.

tmp wrote:
[]
> Is it correct that I can use whole disks (/dev/hdb) only if I make a
> partitionable array and thus creates the partitions UPON the raid
> mechanism?

Just don't use whole disks for md arrays.  *Especially* if you want
to create partitions inside the array.  Instead, create a single
partition (/dev/hdb1) - you will waste the first sector on the disk,
but will be much safer.  The reason is trivial:

Linux raid subsystem is designed to leave almost the whole underlying
device from its very beginning to almost the end for the data, it
stores its superblock (metadata information) at the *end* of the
device (this way, you can mount eg a single component of your
raid1 array without md layer at all, for recovery purposes).

Whenever you will use the whole disk, /dev/hdb, for the raid arrays,
or not, kernel will still look at the partition table in the disk.
This table is at the very beginning of it.  If md array is at the
whole disk, very beginning of the disk is the same as the very
beginning of the array.  So, kernel may recognize something written
to the start of the array as a partition table, and "activate"
all the /dev/hdbN devices.

This is especially the case when you create partitions *inside* the
array (md1p1 etc) -- the same partition table (now valid one) will
be seen in /dev/hdb itself *and* in /dev/md1.

Now, when kernel recognized and activated partitions this way,
the partitions physically will reside somewhere inside the array.
For one, it is unsafe to access the partitions, obviously, and
the kernel will not warn/deny your accesses.

But it is worse.  Suppose you're assembling your arrays by searching
all devices for a superblocks.  The device you want is /dev/hdb,
but kernel recognized partitions on it, and now the superblock is
at the end of both /dev/hdb and the last partition on it, say,
/dev/hdb4 -- you're lucky if your raid assembly tools will pick
up the right one...  (Ok ok, the same applies to normal partitions
as well: it's always ambiguous choice if your last partition is a
part of a raid array, what to chooce: the last partition or the
whole disk)

Also suppose you will later want to boot from this drive, eg
because your real boot drive failed - you will have to actually
move your data off by a single sector to free the room for real
partition table...

To summarize: don't leave the kernel with more than one choice.
It's trivial to avoid the whole issue, with some more yet unknown
to me possible bad sides, by just creating a single partition on
the drive and be done with it, once and forever.

/mjt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-19  8:08       ` David Greaves
@ 2005-04-19 12:18         ` Michael Tokarev
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Tokarev @ 2005-04-19 12:18 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

David Greaves wrote:
> Luca Berra wrote:
> 
>> many people find it easier to understand if raid partitions are set to
>> 0XFD. kernel autodetection is broken and should not be relied upon.
> 
> Could you clarify what is broken?
> I understood that it was simplistic (ie if you have a raid0 built over a 
> raid5
> or something exotic then it may have problems) but essentially worked.
> Could it be :
> * broken for complex raid on raid
> * broken for root devices
> * fine for 'simple', non-root devices

It works when everything works.  If something does not work (your disk
died, you moved disks, or esp. you added another disk from another
machine wich was also a part of (another) raid array), every bad
thing can happen, from just inability to assemble the array at all,
to using the wrong disks/partitions, and to assembling the wrong
array (the one from another machine).  If it's your root device
you're trying to assemble, recovery involves booting from a rescue
CD and cleaning stuff up, which can be problematic at times.

/mjt

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-19 11:00 bernd
@ 2005-04-19 14:40 ` Hervé Eychenne
  2005-04-19 15:27   ` David Greaves
  0 siblings, 1 reply; 26+ messages in thread
From: Hervé Eychenne @ 2005-04-19 14:40 UTC (permalink / raw)
  To: bernd; +Cc: linux-raid

On Tue, Apr 19, 2005 at 01:00:11PM +0200, bernd@rhm.de wrote:

> >You have to mdadm -r remove it and re-add it once you restore the disk.

> First you have to look if there are partitions on that disk to which no
> data was written since the disk failed (this typically concerns the swap
> partition). These partitions have to be marked faulty by hand using mdadm -f
> before you can remove them with mdadm -r.

Ok, but how do you automate/simplify that?

A script with a while loop and some grep,sed commands? A grep on what
exactly? (this kind of precise information seems to be written nowhere in
the manpage of the HOWTOs)

Wouldn't it be much simpler if it could be possible to do something
like the following?
# mdadm --remove-disk /dev/sda
So this command could mark as faulty and remove of the array any
implied partition(s) of the disk to be removed.
Does this currently exist?  If not, would you be willing to integrate a patch
in that sense? It would be much simpler, don't you think?

Same thing for addition...
# mdadm --add-disk /dev/sda
would do the job quite automatically...

 Herve

-- 
 _
(°=  Hervé Eychenne
//)  Homepage:          http://www.eychenne.org/
v_/_ WallFire project:  http://www.wallfire.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-19 14:40 ` Hervé Eychenne
@ 2005-04-19 15:27   ` David Greaves
  2005-04-19 15:54     ` Hervé Eychenne
  0 siblings, 1 reply; 26+ messages in thread
From: David Greaves @ 2005-04-19 15:27 UTC (permalink / raw)
  To: rv; +Cc: bernd, linux-raid

Hervé Eychenne wrote:
> On Tue, Apr 19, 2005 at 01:00:11PM +0200, bernd@rhm.de wrote:
>>First you have to look if there are partitions on that disk to which no
>>data was written since the disk failed (this typically concerns the swap
>>partition). These partitions have to be marked faulty by hand using mdadm -f
>>before you can remove them with mdadm -r.
> 
> 
> Ok, but how do you automate/simplify that?

EVMS?
Or some other enterprise volume manager


> 
> A script with a while loop and some grep,sed commands? A grep on what
> exactly? (this kind of precise information seems to be written nowhere in
> the manpage of the HOWTOs)
You're talking about specific configs - not all sysadmins will want to 
do this.
And those who do can type:
   fdisk -l /dev/sda | grep -i fd | cut -f1 -d' ' | xargs -n1 mdadm -r

> Wouldn't it be much simpler if it could be possible to do something
> like the following?
> # mdadm --remove-disk /dev/sda
> So this command could mark as faulty and remove of the array any
> implied partition(s) of the disk to be removed.

see above 1 liner...

David

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-19 15:27   ` David Greaves
@ 2005-04-19 15:54     ` Hervé Eychenne
  2005-04-19 16:53       ` Frank Wittig
  0 siblings, 1 reply; 26+ messages in thread
From: Hervé Eychenne @ 2005-04-19 15:54 UTC (permalink / raw)
  To: David Greaves; +Cc: bernd, linux-raid

On Tue, Apr 19, 2005 at 04:27:14PM +0100, David Greaves wrote:

> Hervé Eychenne wrote:
> >On Tue, Apr 19, 2005 at 01:00:11PM +0200, bernd@rhm.de wrote:
> >>First you have to look if there are partitions on that disk to which no
> >>data was written since the disk failed (this typically concerns the swap
> >>partition). These partitions have to be marked faulty by hand using mdadm 
> >>-f
> >>before you can remove them with mdadm -r.
> >
> >
> >Ok, but how do you automate/simplify that?

> EVMS?

I didn't experience this yet.  By I currently have RAID1 setup with
mdadm at hand, and I must deal with it...

> Or some other enterprise volume manager

No, thanks. ;-)
I prefer taking time to improve free (as a speech) tools than turning
to other solutions.

> >A script with a while loop and some grep,sed commands? A grep on what
> >exactly? (this kind of precise information seems to be written nowhere in
> >the manpage of the HOWTOs)

> You're talking about specific configs - not all sysadmins will want to 
> do this.

Of course not all sysadmins will want to do this, but that's not
really the question... The question is "why not provide something simple
to those who want?"

> And those who do can type:
>   fdisk -l /dev/sda | grep -i fd | cut -f1 -d' ' | xargs -n1 mdadm -r

I really don't like kludgy things like that...
What if the string fd is present in another line? No, that's really
ugly, sorry. Ok, maybe you'll come one day with a better command
line. But then it will be too complex to remember. So you'll tell me to
save it in a script. But that script will stay a bit kludgy anyway
and it will not be present on any Linux box.
Isn't the insertion/removal of a disk common enough to justify the
addition of a simple and clean mdadm option?

 Herve

-- 
 _
(°=  Hervé Eychenne
//)  Homepage:          http://www.eychenne.org/
v_/_ WallFire project:  http://www.wallfire.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-19 15:54     ` Hervé Eychenne
@ 2005-04-19 16:53       ` Frank Wittig
  2005-04-19 17:54         ` Hervé Eychenne
  0 siblings, 1 reply; 26+ messages in thread
From: Frank Wittig @ 2005-04-19 16:53 UTC (permalink / raw)
  To: rv; +Cc: linux-raid

Hervé Eychenne wrote:

>>And those who do can type:
>>  fdisk -l /dev/sda | grep -i fd | cut -f1 -d' ' | xargs -n1 mdadm -r
> 
> 
> I really don't like kludgy things like that...
[...]
> Isn't the insertion/removal of a disk common enough to justify the
> addition of a simple and clean mdadm option?

have you thought about the idea that there is a certain clue behind the
actual behaviour of the mdadm tool?
is it so annoying to you to mark a disk/partition as faulty before
removing it?

do you think it makes sense to implement every single case in an extra
command line option?

did you ever thought about switching to a hardware where you can remove
and add disks without having to do anything else than pull the old one
out and push teh new one in?

i run several raid arrays on many machines and i find the tools quite
useful. if you mind such command lines like the one above you should
think about switching to a microsoft product where you can push your
mouse arround and tell everyone that you can do what you want without
those kludgy command lines which no one really understands.

so please ask and learn.
there are many people on this list which are pleased to answer your
questions.
the idea behind *n?x systems is to combine simple functionality through
pipes and redirects to gain unlimited complexy and power. so if you want
to use the full power of *n?x systems you have to get used to this
"kludgy" command lines.

the more you get used to it, the less "kludgy" they will be.

SCNR

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-19 16:53       ` Frank Wittig
@ 2005-04-19 17:54         ` Hervé Eychenne
  2005-04-19 19:46           ` Frank Wittig
  0 siblings, 1 reply; 26+ messages in thread
From: Hervé Eychenne @ 2005-04-19 17:54 UTC (permalink / raw)
  To: Frank Wittig; +Cc: linux-raid

On Tue, Apr 19, 2005 at 06:53:52PM +0200, Frank Wittig wrote:

> >>And those who do can type:
> >>  fdisk -l /dev/sda | grep -i fd | cut -f1 -d' ' | xargs -n1 mdadm -r
> > 
> > 
> > I really don't like kludgy things like that...
> [...]
> > Isn't the insertion/removal of a disk common enough to justify the
> > addition of a simple and clean mdadm option?

> have you thought about the idea that there is a certain clue behind the
> actual behaviour of the mdadm tool?
> is it so annoying to you to mark a disk/partition as faulty before
> removing it?

I'm sorry, but having to do a cat /proc/mdstat, figure out by myself
what to do (which partition is concerned), then type several commands
(for each concerned partition) actually is painful.
Maybe you are an experienced guy so it seems so simple to you... but
I'm always amused when an experienced guy refuses to make things
simpler for those who aren't as much as he is. And sends them to
Microsoft. Great.

This mailing-list is probably full of kernel guys, so maybe I should
have guessed.  But I come here as a user (who wants RAID to work as
smoothly as possible), having found no other mailing-list (a user one)
for RAID on Linux. (did I miss it?)
Maybe I'm not asking questions to the right people, but for me,
computer science is about automating things.  And the process or
replacing a crashed disk (described above) on a system managed with
mdadm is not particularly automated, right?
Maybe it's not a problem for _you_, because you know exactly what to
do by heart.  So you've forgotten the complexity.  But it's there, and
even if it's good that you can do complex and powerful things, it's
not normal to force people to get into that complexity to do simple
things. Think about it.

> do you think it makes sense to implement every single case in an extra
> command line option?

I personnaly do not consider that this is yet another case.  For me,
RAID is about have disk availability, right?  So the most common
production case is definitely when one of your disks crashed, and you
want to replace it.
There must be some kind of way to deal with that without typing too
much contextual command lines.

Whether this simple way should belong to mdadm is another question, but
I personnaly think it should, as it would introduce no overhead
(would it, really ?) and would be very helpful.  Let me reassure you,
you could stay with several commands if you like. :-)

> did you ever thought about switching to a hardware where you can remove
> and add disks without having to do anything else than pull the old one
> out and push teh new one in?

Ok, here we are...
[First, the RAID controller I'm forced to deal with has no Linux
driver, but that's not important for our discussion.]
Software RAID is about doing the same that hardware RAID, but in soft.
I think we agree on that. ;-)
So I see absolutely no reason why software RAID should not be as
simple as possible.  And RAID management with mdadm could be made
simpler for a common case like that.

> i run several raid arrays on many machines and i find the tools quite
> useful.

They are. They could be even more if things were as simple as
possible.

> if you mind such command lines like the one above you should
> think about switching to a microsoft product where you can push your
> mouse arround and tell everyone that you can do what you want without
> those kludgy command lines which no one really understands.

> so please ask and learn.

You tell me to ask and learn from kernel guys who like to type command
lines (I do, but I don't want to force everyone to do so). So maybe I
can tell you to please learn from users who like the command line, but
try to make simple things as simple as possible.

> there are many people on this list which are pleased to answer your
> questions.
> the idea behind *n?x systems is to combine simple functionality through
> pipes and redirects to gain unlimited complexy and power. so if you want
> to use the full power of *n?x systems you have to get used to this
> "kludgy" command lines.

I don't agree with that. Using grep on vague patterns is not
what I call power. Having to type several commands when one would
be enough (I insist that I think we are talking about one of the most
common cases) is not powerful, according to me.
My motto is "be as complex as possible for people who want power
(you, and sometimes me), but be as simple as possible for people who
just want things to be done quickly, simply, and efficiently (sometimes
me, and all the others)".

> the more you get used to it, the less "kludgy" they will be.

Of course, but the very idea is that one shouldn't have to get used to
it too much to perform simple and common actions.

But I guess we'll never agree anyway... :-(

 Herve

-- 
 _
(°=  Hervé Eychenne
//)  Homepage:          http://www.eychenne.org/
v_/_ WallFire project:  http://www.wallfire.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-19 17:54         ` Hervé Eychenne
@ 2005-04-19 19:46           ` Frank Wittig
  2005-04-20  4:15             ` Guy
  0 siblings, 1 reply; 26+ messages in thread
From: Frank Wittig @ 2005-04-19 19:46 UTC (permalink / raw)
  To: rv; +Cc: linux-raid

Hervé Eychenne wrote:

>Maybe you are an experienced guy so it seems so simple to you... but
>I'm always amused when an experienced guy refuses to make things
>simpler for those who aren't as much as he is. And sends them to
>Microsoft. Great.
>  
>
i don't send you to microsoft. i want you to understand the philosophy
behind linux.
the sort of functionality you want doesn't belong to the mdadm tool.
mdadm is the command line interface to dm.
more complex functionality like the one you desire is covered by
frontends like EVMS.
i don't know EVMS but what i heard about it sounds exactly like what you
want. simple administration without having to know how things work in
the background.

>I personnaly do not consider that this is yet another case.  For me,
>RAID is about have disk availability, right?  So the most common
>production case is definitely when one of your disks crashed, and you
>want to replace it.
>There must be some kind of way to deal with that without typing too
>much contextual command lines.
>  
>
after the first time you lose data because of a failure of an automated
process you will think different about that.
i think automation is fine for normal operation.
failure of a component is far from normal and in this case full control
is what you want/need.

>Whether this simple way should belong to mdadm is another question, but
>I personnaly think it should, as it would introduce no overhead
>(would it, really ?) and would be very helpful.
>  
>
KISS: keep it stupid simple
this is the philosophy. keep low-level tools stupid simple. more
complexity brings a higher risk of failure.
we're talking about raid. not about doing backups or syncing the system
clock.

>>did you ever thought about switching to a hardware where you can remove
>>and add disks without having to do anything else than pull the old one
>>out and push teh new one in?
>>    
>>
>Ok, here we are...
>[First, the RAID controller I'm forced to deal with has no Linux
>driver, but that's not important for our discussion.]
>  
>
there are some nice boxes arround. ther take a bunch of disks and appear
to the host as a simple SCSI disk. had such a thing in the past.
replacing disks was so simple a secretary could have done that. ;-)

>I don't agree with that. Using grep on vague patterns is not
>  
>
i think grep is far more powerfull as you think.

>>the more you get used to it, the less "kludgy" they will be.
>>    
>>
>Of course, but the very idea is that one shouldn't have to get used to
>it too much to perform simple and common actions.
>  
>
if replacing disks is a common case to you, you should buy your disks
from a different manufacturer. ;-)
and if you have so many arrays that a disk failure is common because of
the number of disks, you would want to know the basics.

>But I guess we'll never agree anyway... :-(
>  
>
we're just on different levels of usage. and there's a tool for everyone
on us.
my tool is mdadm.
and yours is EVMS or some other high level frontend which abstracts the
use of the low-level tools behind a nice looking UI.

greetings,
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Questions about software RAID
  2005-04-19 19:46           ` Frank Wittig
@ 2005-04-20  4:15             ` Guy
  2005-04-20  7:59               ` David Greaves
  2005-04-20 15:49               ` Martin K. Petersen
  0 siblings, 2 replies; 26+ messages in thread
From: Guy @ 2005-04-20  4:15 UTC (permalink / raw)
  To: 'Frank Wittig', rv; +Cc: linux-raid



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Frank Wittig
> Sent: Tuesday, April 19, 2005 3:47 PM
> To: rv@eychenne.org
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Questions about software RAID
> 
> Hervé Eychenne wrote:
> 
> >Maybe you are an experienced guy so it seems so simple to you... but
> >I'm always amused when an experienced guy refuses to make things
> >simpler for those who aren't as much as he is. And sends them to
> >Microsoft. Great.
> >
> >
> i don't send you to microsoft. i want you to understand the philosophy
> behind linux.
> the sort of functionality you want doesn't belong to the mdadm tool.
> mdadm is the command line interface to dm.
> more complex functionality like the one you desire is covered by
> frontends like EVMS.
> i don't know EVMS but what i heard about it sounds exactly like what you
> want. simple administration without having to know how things work in
> the background.
> 
> >I personnaly do not consider that this is yet another case.  For me,
> >RAID is about have disk availability, right?  So the most common
> >production case is definitely when one of your disks crashed, and you
> >want to replace it.
> >There must be some kind of way to deal with that without typing too
> >much contextual command lines.
> >
> >
> after the first time you lose data because of a failure of an automated
> process you will think different about that.
> i think automation is fine for normal operation.
> failure of a component is far from normal and in this case full control
> is what you want/need.
> 
> >Whether this simple way should belong to mdadm is another question, but
> >I personnaly think it should, as it would introduce no overhead
> >(would it, really ?) and would be very helpful.
> >
> >
> KISS: keep it stupid simple
> this is the philosophy. keep low-level tools stupid simple. more
> complexity brings a higher risk of failure.
> we're talking about raid. not about doing backups or syncing the system
> clock.
> 
> >>did you ever thought about switching to a hardware where you can remove
> >>and add disks without having to do anything else than pull the old one
> >>out and push teh new one in?
> >>
> >>
> >Ok, here we are...
> >[First, the RAID controller I'm forced to deal with has no Linux
> >driver, but that's not important for our discussion.]
> >
> >
> there are some nice boxes arround. ther take a bunch of disks and appear
> to the host as a simple SCSI disk. had such a thing in the past.
> replacing disks was so simple a secretary could have done that. ;-)
> 
> >I don't agree with that. Using grep on vague patterns is not
> >
> >
> i think grep is far more powerfull as you think.
> 
> >>the more you get used to it, the less "kludgy" they will be.
> >>
> >>
> >Of course, but the very idea is that one shouldn't have to get used to
> >it too much to perform simple and common actions.
> >
> >
> if replacing disks is a common case to you, you should buy your disks
> from a different manufacturer. ;-)
> and if you have so many arrays that a disk failure is common because of
> the number of disks, you would want to know the basics.
> 
> >But I guess we'll never agree anyway... :-(
> >
> >
> we're just on different levels of usage. and there's a tool for everyone
> on us.
> my tool is mdadm.
> and yours is EVMS or some other high level frontend which abstracts the
> use of the low-level tools behind a nice looking UI.
> 
> greetings,
> Frank

Well, I agree with KISS, but from the operator's point of view!

I want the failed disk to light a red LED.
I want the tray the disk is in to light a red LED.
I want the cabinet the tray is in to light a red LED.
I want the re-build to the spare to start.
I want the operator du jour to notice the red LEDs.
I want the operator to remove the failed disk.
I want the operator to install the new disk.
I want the re-build to the new disk to start.
I want the re-build to not fail the current spare so data says redundant.
I want the old spare to become the spare again. (optional)

The operator would log the event:
"Disk xyz's LED went red, I replaced the disk, the red LED went out."

In my opinion, most operators would not be able to replace a disk on a md
RAID system.  It is much too complex!  Most operators need written
procedures.  They can't use independent thought to resolve problems.

Also, most operators can't use vi!  So, if you can use vi, you are better
than most operators!!!  IMO.

Of course I can't have the red LED, but the disks could be labeled and an
email sent saying disk xyz has failed.  The operator could then replace disk
xyz, if they could find it!  Then another email(s) with a status update.

With most (maybe all) hardware RAID systems I have used, the above is how it
works.  Even the red LED!!!  But these are dedicated RAID systems, not off
the shelf components.

I don't expect a software solution to ever be as easy as hardware, but I do
agree it needs to be much more operator friendly than it is today.

Guy

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-20  4:15             ` Guy
@ 2005-04-20  7:59               ` David Greaves
  2005-04-20  9:26                 ` Molle Bestefich
  2005-04-20 15:49               ` Martin K. Petersen
  1 sibling, 1 reply; 26+ messages in thread
From: David Greaves @ 2005-04-20  7:59 UTC (permalink / raw)
  To: Guy; +Cc: 'Frank Wittig', rv, linux-raid

Guy wrote:
> Well, I agree with KISS, but from the operator's point of view!
> 
> I want... <snip>

Fair enough.
But I think the point is - should you expect the mdadm command to do all 
that?
or do you think that it would make sense to stick with a layered 
approach that allows anything from my Zaurus PDA to an S390 mainframe to 
use basic md - with the S390 probably layering some management sw like 
EVMS over the top of md/mdadm.

The *nix command line tool philosophy has generally been "do one thing 
and do it well". It does provide some confusion for newbies when they 
see a collection of tools - but try running the one-liner - or numerous 
others like it - on OSes or tools that take the monolithic approach.

Also - should the LED control code be built into mdadm? And should Neil 
write it?
Is that for Dell controllers? or IBM ones? or SGI ones? what about my 
homemade parallel port one? What about controlling the LEDs on my PDA? 
Or should it be a seperate bit of code that needs a wrapper script and 
plugs in to a modular system like - you guessed it - EVMS.

And Guy, I know what *you* think :)
And I think the EVMS folk would accept patches for your suggestions - 
including any LED control ones.

I do think you would need to ask Neil to support
mdadm --sync-/dev/sdc-to-replace-/dev/sdg-even-though-/dev/sdg-is-fine
mdadm --use-/dev/sdc-and-make-/dev/sdg-spare

which would be especially useful if /dev/sdg were part of a shared 
spares pool.

David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-20  7:59               ` David Greaves
@ 2005-04-20  9:26                 ` Molle Bestefich
  2005-04-20  9:32                   ` Hervé Eychenne
  2005-04-20 11:16                   ` Peter T. Breuer
  0 siblings, 2 replies; 26+ messages in thread
From: Molle Bestefich @ 2005-04-20  9:26 UTC (permalink / raw)
  To: linux-raid

David Greaves wrote:
> Guy wrote:
> > Well, I agree with KISS, but from the operator's point of view!
> >
> > I want...
> [snip]
> 
> Fair enough.
[snip]
> should the LED control code be built into mdadm?

Obviously not.
But currently, a LED control app would have to pull information from
/proc/mdstat, right?
mdstat is a crappy place to derive any state from.
It currently seems to have a dual purpose:
 - being a simple textual representation of RAID state for the user.
 - providing MD state information for userspace apps.

That's not good.

There seems to be an obvious lack of a properly thought out interface
to notify userspace applications of MD events (disk failed --> go
light a LED, etc).

Please correct me if I'm on the wrong track, in which case the rest of
this posting will be bogus.  Maybe there are IOCTLs or such that I'm
not aware of.

I'm not sure how a proper interface could be done (so I'm basically
just blabbering).  ACPI has some sort of event system, but the MD one
would need to be more flexible.  For instance userspace apps has to
pick up on MD events such as disk failures, even if the userspace app
happens to not be running in the exact moment that the event occurs
(due to system restart, daemon restart or what not).  So the system
that ACPI uses is probably unsuited.

Perhaps a simple logfile would do.  It's focus should be
machine-readability (vs. human readability for mdstat).  A userspace
app could follow MD's state from the beginning (bootup, no devices
discovered, logfile cleared), through device discovery and RAID
assembly and to failing devices.  By adding up the information in all
the log lines, a userspace app could derive the current state of MD
(which disks are dead..).

Just a thought.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-20  9:26                 ` Molle Bestefich
@ 2005-04-20  9:32                   ` Hervé Eychenne
  2005-04-20 17:36                     ` Molle Bestefich
  2005-04-20 11:16                   ` Peter T. Breuer
  1 sibling, 1 reply; 26+ messages in thread
From: Hervé Eychenne @ 2005-04-20  9:32 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: linux-raid

On Wed, Apr 20, 2005 at 11:26:28AM +0200, Molle Bestefich wrote:

> David Greaves wrote:
> > Guy wrote:
> > > Well, I agree with KISS, but from the operator's point of view!
> > >
> > > I want...
> > [snip]
> > 
> > Fair enough.
> [snip]
> > should the LED control code be built into mdadm?

> Obviously not.
> But currently, a LED control app would have to pull information from
> /proc/mdstat, right?
> mdstat is a crappy place to derive any state from.
> It currently seems to have a dual purpose:
>  - being a simple textual representation of RAID state for the user.
>  - providing MD state information for userspace apps.

> That's not good.

I could not agree more.

> There seems to be an obvious lack of a properly thought out interface
> to notify userspace applications of MD events (disk failed --> go
> light a LED, etc).

> Please correct me if I'm on the wrong track, in which case the rest of
> this posting will be bogus.  Maybe there are IOCTLs or such that I'm
> not aware of.

> I'm not sure how a proper interface could be done (so I'm basically
> just blabbering).  ACPI has some sort of event system, but the MD one
> would need to be more flexible.  For instance userspace apps has to
> pick up on MD events such as disk failures, even if the userspace app
> happens to not be running in the exact moment that the event occurs
> (due to system restart, daemon restart or what not).  So the system
> that ACPI uses is probably unsuited.

> Perhaps a simple logfile would do.

No, as it requires active polling.
I think something like a netlink device would be more accurate, but I'm
not a kernel guru.

 Herve

-- 
 _
(°=  Hervé Eychenne
//)  Homepage:          http://www.eychenne.org/
v_/_ WallFire project:  http://www.wallfire.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-20  9:26                 ` Molle Bestefich
  2005-04-20  9:32                   ` Hervé Eychenne
@ 2005-04-20 11:16                   ` Peter T. Breuer
  2005-04-20 12:34                     ` Lars Marowsky-Bree
  1 sibling, 1 reply; 26+ messages in thread
From: Peter T. Breuer @ 2005-04-20 11:16 UTC (permalink / raw)
  To: linux-raid

Molle Bestefich <molle.bestefich@gmail.com> wrote:
> There seems to be an obvious lack of a properly thought out interface
> to notify userspace applications of MD events (disk failed --> go
> light a LED, etc).

Well, that's probably truish. I've been meaning to ask for a per-device
sysctl interface for some time.

Peter


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-20 11:16                   ` Peter T. Breuer
@ 2005-04-20 12:34                     ` Lars Marowsky-Bree
  0 siblings, 0 replies; 26+ messages in thread
From: Lars Marowsky-Bree @ 2005-04-20 12:34 UTC (permalink / raw)
  To: linux-raid

On 2005-04-20T13:16:24, "Peter T. Breuer" <ptb@lab.it.uc3m.es> wrote:

> > There seems to be an obvious lack of a properly thought out interface
> > to notify userspace applications of MD events (disk failed --> go
> > light a LED, etc).
> Well, that's probably truish. I've been meaning to ask for a per-device
> sysctl interface for some time.

sysctl? Surely you mean sysfs... ;-)


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-20  4:15             ` Guy
  2005-04-20  7:59               ` David Greaves
@ 2005-04-20 15:49               ` Martin K. Petersen
  2005-04-21  1:21                 ` Guy
  1 sibling, 1 reply; 26+ messages in thread
From: Martin K. Petersen @ 2005-04-20 15:49 UTC (permalink / raw)
  To: Guy; +Cc: 'Frank Wittig', rv, linux-raid

>>>>> "Guy" == Guy  <bugzilla@watkins-home.com> writes:

Guy> I want the failed disk to light a red LED.  
Guy> I want the tray the disk is in to light a red LED.  
Guy> I want the cabinet the tray is in to light a red LED.  

That's easy when you have a custom hardware RAID enclosure that you
have control over.  As you suggest yourself, it's not easy when you
have off the shelf components.

What happens in "real" storage systems is that the SCSI bus is
monitored by a SAF-TE or SES chip.  The OS (in this case the RAID
controller firmware) will talk to the SAF-TE device or access the SES
page to get information about hot swap events, failed disks, stopped
fans, busted power supply, etc.  

I messed with a daemon to monitor enclosures implementing either of
these two standards during the infancy of hotplug.  I should probably
look into that again.  But obviously this would only apply to disk
trays with suitable monitoring hardware.

Guy> I want the re-build to the new disk to start.  

Are you sure?  How do you know that the disk you just inserted is
something you want to use for the RAID?  What if you hook up - say - a
USB storage device to back up data before you start messing with
things?  You most definitely don't want the RAID to start scribbling
over any random device you hook up to a system with a failed RAID
device.

In the HW RAID enclosure case that's easy - again because the whole
tray is under the array firmware's control.

Definining a generic resync-on-hotplug policy is not trivial.  One
policy that might work for most people is sync if a new disk is
inserted on the same address (SCSI controller, channel, id, lun).  But
there's no one size fits all policy.

And this is not just because Linux sucks.  It's simply that a lot of
the "easy" HW RAID features are a result of appropriately designed
hardware.

We can certainly make Linux work more smoothly on hardware that allows
for monitoring and predictable addressing, etc.  But in the low end
it'll have to be a policy defined by the sysadmin.  And we probably
want to leave it a sysadmin configurable policy even if the hardware
implements the required magic.

-- 
Martin K. Petersen	Wild Open Source, Inc.
mkp@wildopensource.com	http://www.wildopensource.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Questions about software RAID
  2005-04-20  9:32                   ` Hervé Eychenne
@ 2005-04-20 17:36                     ` Molle Bestefich
  0 siblings, 0 replies; 26+ messages in thread
From: Molle Bestefich @ 2005-04-20 17:36 UTC (permalink / raw)
  To: rv; +Cc: linux-raid

Hervé Eychenne wrote:
> Molle Bestefich wrote:
> > There seems to be an obvious lack of a properly thought out interface
> > to notify userspace applications of MD events (disk failed --> go
> > light a LED, etc).
> > 
> > I'm not sure how a proper interface could be done (so I'm basically
> > just blabbering).  ACPI has some sort of event system, but the MD one
> > would need to be more flexible.  For instance userspace apps has to
> > pick up on MD events such as disk failures, even if the userspace app
> > happens to not be running in the exact moment that the event occurs
> > (due to system restart, daemon restart or what not).  So the system
> > that ACPI uses is probably unsuited.
> > 
> > Perhaps a simple logfile would do.  It's focus should be
> > machine-readability (vs. human readability for mdstat).  A userspace
> > app could follow MD's state from the beginning (bootup, no devices
> > discovered, logfile cleared), through device discovery and RAID
> > assembly and to failing devices.  By adding up the information in all
> > the log lines, a userspace app could derive the current state of MD
> > (which disks are dead..).
>
> No, as it requires active polling.

No it doesn't.
Just tail -f the logfile (or /proc/xxxx or /sys/xxxx "file"), and your
app will receive due notice exactly when something happens.  Or use
inotify.

> I think something like a netlink device would be more accurate,
> but I'm not a kernel guru.

No idea how that works :-).
If by "accurate" you mean you'll get a faster reaction, that's wrong
as per above explanation.  And I'll try to explain why a logfile in
other respects are actually _more_ accurate.

I can see why a logfile _seems_ wrong at first sight.
But the idea that it allows you to (*also*!) see historic MD events
instead of just the current status this instant seems compelling.

 - You can be sure that you haven't missed or lost any MD events.  If
your monitoring app crashes or restarts, just look in the log.  (If
you're unsure whether you've notified the admin on some event or not;
I'm sure MD could log the disk's event counters.  The monitoring app
could keep it's own "how far have I gotten" event counter [on disk],
so the app knows "it's own status".)

 - If the log resides in eg. /proc/whatever, you can pipe it to an
actual file.  It could be pretty useful for debugging MD (attach your
MD log, send a mail asking "what happened", and it'll be clear to the
super-md-dude at first sight).

 - Seems more convincing to enterprise customers that you can actually
see MD's every move in the log.  Makes it seem much more robust and
reliable.

 - Really useful for debugging the monitoring app....

 - Probably other advantages.....  Haven't really thought it trough
that well :-).

The problem, as I see it, is if it's worth the implementation trouble
(is it any harder than to implement a netlink / what not interface? 
No idea!)
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Questions about software RAID
  2005-04-20 15:49               ` Martin K. Petersen
@ 2005-04-21  1:21                 ` Guy
  0 siblings, 0 replies; 26+ messages in thread
From: Guy @ 2005-04-21  1:21 UTC (permalink / raw)
  To: 'Martin K. Petersen'; +Cc: 'Frank Wittig', rv, linux-raid

> From: Martin K. Petersen [mailto:mkp@mkp.net]
> Sent: Wednesday, April 20, 2005 11:49 AM
> To: Guy
> Cc: 'Frank Wittig'; rv@eychenne.org; linux-raid@vger.kernel.org
> Subject: Re: Questions about software RAID
> 
> >>>>> "Guy" == Guy  <bugzilla@watkins-home.com> writes:
> 
> Guy> I want the failed disk to light a red LED.
> Guy> I want the tray the disk is in to light a red LED.
> Guy> I want the cabinet the tray is in to light a red LED.
> 
> That's easy when you have a custom hardware RAID enclosure that you
> have control over.  As you suggest yourself, it's not easy when you
> have off the shelf components.
> 
> What happens in "real" storage systems is that the SCSI bus is
> monitored by a SAF-TE or SES chip.  The OS (in this case the RAID
> controller firmware) will talk to the SAF-TE device or access the SES
> page to get information about hot swap events, failed disks, stopped
> fans, busted power supply, etc.
> 
> I messed with a daemon to monitor enclosures implementing either of
> these two standards during the infancy of hotplug.  I should probably
> look into that again.  But obviously this would only apply to disk
> trays with suitable monitoring hardware.
> 
> 
> Guy> I want the re-build to the new disk to start.
> 
> Are you sure?  How do you know that the disk you just inserted is
> something you want to use for the RAID?  What if you hook up - say - a
> USB storage device to back up data before you start messing with
> things?  You most definitely don't want the RAID to start scribbling
> over any random device you hook up to a system with a failed RAID
> device.

Yes I am sure, but the new disk would be replacing the old disk.  Same bus
same slot same ID/LUN or whatever.  This may not be reasonable with all bus
types, but with SCSI/SCA it is.  Also, my wish list would need to be defined
when the system is setup, I would not expect all systems to work this way.
It would be fine to have a user interface that indicated a "new" disk was
found and prompt the user for permission to use it.

My wish list would have prerequisites!  Maybe EVMS, or some other "special"
layer that can notice a disk has been removed and notice when a different
disk has been installed.  Only 1 partition, or full disk.  You can't (should
not) pull a disk that has 2 or more partitions just because 1 may be bad!
Maybe more prerequisites that I can't determine.  But, assuming I meet the
theoretical prerequisites, I should be able to build a system that can be
maintained by "normal" sysadmins.  These admins may be called operators in
some environments.  But with today's tools you need a Linux expert to
replace a disk.  IMO.  And I don't think that is acceptable!

Don't get me wrong!!!  I love Linux, but I want improvements and features!

Guy

> 
> In the HW RAID enclosure case that's easy - again because the whole
> tray is under the array firmware's control.
> 
> Definining a generic resync-on-hotplug policy is not trivial.  One
> policy that might work for most people is sync if a new disk is
> inserted on the same address (SCSI controller, channel, id, lun).  But
> there's no one size fits all policy.
> 
> And this is not just because Linux sucks.  It's simply that a lot of
> the "easy" HW RAID features are a result of appropriately designed
> hardware.
> 
> We can certainly make Linux work more smoothly on hardware that allows
> for monitoring and predictable addressing, etc.  But in the low end
> it'll have to be a policy defined by the sysadmin.  And we probably
> want to leave it a sysadmin configurable policy even if the hardware
> implements the required magic.
> 
> --
> Martin K. Petersen	Wild Open Source, Inc.
> mkp@wildopensource.com	http://www.wildopensource.com/


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2005-04-21  1:21 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-18 19:50 Questions about software RAID tmp
2005-04-18 20:12 ` David Greaves
2005-04-18 23:12   ` tmp
2005-04-19  6:36     ` Peter T. Breuer
2005-04-19  7:15     ` Luca Berra
2005-04-19  8:08       ` David Greaves
2005-04-19 12:18         ` Michael Tokarev
2005-04-19 12:08     ` Don't use whole disks for raid arrays [was: Questions about software RAID] Michael Tokarev
2005-04-18 20:15 ` Questions about software RAID Peter T. Breuer
2005-04-18 20:50 ` Frank Wittig
  -- strict thread matches above, loose matches on Subject: below --
2005-04-19 11:00 bernd
2005-04-19 14:40 ` Hervé Eychenne
2005-04-19 15:27   ` David Greaves
2005-04-19 15:54     ` Hervé Eychenne
2005-04-19 16:53       ` Frank Wittig
2005-04-19 17:54         ` Hervé Eychenne
2005-04-19 19:46           ` Frank Wittig
2005-04-20  4:15             ` Guy
2005-04-20  7:59               ` David Greaves
2005-04-20  9:26                 ` Molle Bestefich
2005-04-20  9:32                   ` Hervé Eychenne
2005-04-20 17:36                     ` Molle Bestefich
2005-04-20 11:16                   ` Peter T. Breuer
2005-04-20 12:34                     ` Lars Marowsky-Bree
2005-04-20 15:49               ` Martin K. Petersen
2005-04-21  1:21                 ` Guy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).