0.90 vs 1.X - Differing behavior for device # during fail/remove/add operation

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 0.90 vs 1.X - Differing behavior for device # during fail/remove/add operation
@ 2013-05-08 17:39 Dusty Mabe
  2013-05-09 15:30 ` Dusty Mabe
  0 siblings, 1 reply; 7+ messages in thread
From: Dusty Mabe @ 2013-05-08 17:39 UTC (permalink / raw)
  To: linux-raid; +Cc: Dusty Mabe

CentOS 6.3
2.6.32-279.19.1
mdadm-3.2.3-9.el6.x86_64


I have noticed that the device number printed in the mdstat file gets
changed if you fail->remove->add a member device of a 1.X metadata
array, but for a 0.90 metadata array the device will go back to the
original value once recovery has finished.

This means after recovery has finished I end up with the following
outputs for 1.1 and 0.90 metadata:

1.1 METADATA /proc/mdstat (trimmed)
md50 : active raid1 loop0[2] loop1[1]
      32760 blocks super 1.1 [2/2] [UU]

1.1 METADATA mdadm --detail (trimmed)
    Number   Major   Minor   RaidDevice State
       2       7        0        0      active sync   /dev/loop0
       1       7        1        1      active sync   /dev/loop1

0.90 METADATA /proc/mdstat (trimmed)
md50 : active raid1 loop0[0] loop1[1]
      32704 blocks [2/2] [UU]

0.90 METADATA mdadm --detail (trimmed)
    Number   Major   Minor   RaidDevice State
       0       7        0        0      active sync   /dev/loop0
       1       7        1        1      active sync   /dev/loop1



Is this by design? I know 1.X version of metadata use dev_roles rather
than this_disk and the mdp_disk_t structure so maybe it is by design?



A simple re-creator is below if you want to see it in action Please
analyze it first to make sure my assumptions are valid in your case.
Change --meta from 1.1 to 0.90 to see it for the other metadata.

# Create two sparse disk image files
dd if=/dev/zero of=/tmp/disk1.img bs=1 count=0 seek=32M
dd if=/dev/zero of=/tmp/disk2.img bs=1 count=0 seek=32M

# Set them up as loopback devices
losetup -f /tmp/disk1.img
losetup -f /tmp/disk2.img

# Create a raid 1 out of them
mdadm --create /dev/md50 --level=1 --raid-devices=2 --meta=1.1
/dev/loop0 /dev/loop1

# Fail/remove/zero-superblock/add device back to array
mdadm --fail /dev/md50 /dev/loop0
sleep 1
mdadm --remove /dev/md50 /dev/loop0
sleep 1
mdadm --zero-superblock /dev/loop0
sleep 1
mdadm --add /dev/md50 /dev/loop0
cat /proc/mdstat | grep -A 2 md50
sleep 10
cat /proc/mdstat | grep -A 2 md50

# Clean up
mdadm --stop /dev/md50
losetup -d /dev/loop0
losetup -d /dev/loop1
rm -f /tmp/disk1.img
rm -f /tmp/disk2.img


Thanks for any input!

Dusty

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 0.90 vs 1.X - Differing behavior for device # during fail/remove/add operation
  2013-05-08 17:39 0.90 vs 1.X - Differing behavior for device # during fail/remove/add operation Dusty Mabe
@ 2013-05-09 15:30 ` Dusty Mabe
  2013-05-09 21:29   ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Dusty Mabe @ 2013-05-09 15:30 UTC (permalink / raw)
  To: linux-raid; +Cc: Dusty Mabe

On Wed, May 8, 2013 at 1:39 PM, Dusty Mabe <dustymabe@gmail.com> wrote:
> CentOS 6.3
> 2.6.32-279.19.1
> mdadm-3.2.3-9.el6.x86_64
>
>
> I have noticed that the device number printed in the mdstat file gets
> changed if you fail->remove->add a member device of a 1.X metadata
> array, but for a 0.90 metadata array the device will go back to the
> original value once recovery has finished.
>
>
> Is this by design? I know 1.X version of metadata use dev_roles rather
> than this_disk and the mdp_disk_t structure so maybe it is by design?
>

Hey guys.. Sorry to be a pain. Has anyone seen this before? Is it by
design or a known issue?

Dusty

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 0.90 vs 1.X - Differing behavior for device # during fail/remove/add operation
  2013-05-09 15:30 ` Dusty Mabe
@ 2013-05-09 21:29   ` NeilBrown
  2013-05-11  0:04     ` Dusty Mabe
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2013-05-09 21:29 UTC (permalink / raw)
  To: Dusty Mabe; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 890 bytes --]

On Thu, 9 May 2013 11:30:26 -0400 Dusty Mabe <dustymabe@gmail.com> wrote:

> On Wed, May 8, 2013 at 1:39 PM, Dusty Mabe <dustymabe@gmail.com> wrote:
> > CentOS 6.3
> > 2.6.32-279.19.1
> > mdadm-3.2.3-9.el6.x86_64
> >
> >
> > I have noticed that the device number printed in the mdstat file gets
> > changed if you fail->remove->add a member device of a 1.X metadata
> > array, but for a 0.90 metadata array the device will go back to the
> > original value once recovery has finished.
> >
> >
> > Is this by design? I know 1.X version of metadata use dev_roles rather
> > than this_disk and the mdp_disk_t structure so maybe it is by design?
> >
> 
> Hey guys.. Sorry to be a pain. Has anyone seen this before? Is it by
> design or a known issue?
> 
It is an unfortunate consequence of incoherent design.
I've occasionally wondered if I should "fix" it.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 0.90 vs 1.X - Differing behavior for device # during fail/remove/add operation
  2013-05-09 21:29   ` NeilBrown
@ 2013-05-11  0:04     ` Dusty Mabe
  2013-05-14 20:31       ` Dusty Mabe
  2013-05-16  0:41       ` NeilBrown
  0 siblings, 2 replies; 7+ messages in thread
From: Dusty Mabe @ 2013-05-11  0:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Dusty Mabe

On Thu, May 9, 2013 at 5:29 PM, NeilBrown <neilb@suse.de> wrote:
>
> It is an unfortunate consequence of incoherent design.
> I've occasionally wondered if I should "fix" it.

Neil,

Thanks for the insight. I don't know the code well but I do notice
that changing the md_seq_show() function to print out the raid_disk
rather than desc_nr at least gives me the "desired" behavior from
/proc/mdstat. This still doesn't change the fact that the "Number" is
still changed in the mdadm --detail output but is a quick easy way to
change mdstat without actually having to re-architect anything.

What do you think?

diff --git a/drivers/md/md.c b/drivers/md/md.c
index aeceedf..b47fd35 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7023,7 +7023,7 @@ static int md_seq_show(struct seq_file *seq, void *v)
                rdev_for_each(rdev, mddev) {
                        char b[BDEVNAME_SIZE];
                        seq_printf(seq, " %s[%d]",
-                               bdevname(rdev->bdev,b), rdev->desc_nr);
+                               bdevname(rdev->bdev,b), rdev->raid_disk);
                        if (test_bit(WriteMostly, &rdev->flags))
                                seq_printf(seq, "(W)");
                        if (test_bit(Faulty, &rdev->flags)) {

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: 0.90 vs 1.X - Differing behavior for device # during fail/remove/add operation
  2013-05-11  0:04     ` Dusty Mabe
@ 2013-05-14 20:31       ` Dusty Mabe
  2013-05-16  0:41       ` NeilBrown
  1 sibling, 0 replies; 7+ messages in thread
From: Dusty Mabe @ 2013-05-14 20:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Dusty Mabe

On Fri, May 10, 2013 at 8:04 PM, Dusty Mabe <dustymabe@gmail.com> wrote:
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index aeceedf..b47fd35 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7023,7 +7023,7 @@ static int md_seq_show(struct seq_file *seq, void *v)
>                 rdev_for_each(rdev, mddev) {
>                         char b[BDEVNAME_SIZE];
>                         seq_printf(seq, " %s[%d]",
> -                               bdevname(rdev->bdev,b), rdev->desc_nr);
> +                               bdevname(rdev->bdev,b), rdev->raid_disk);
>                         if (test_bit(WriteMostly, &rdev->flags))
>                                 seq_printf(seq, "(W)");
>                         if (test_bit(Faulty, &rdev->flags)) {


Hey Neil, Sorry to be a pest. Do you have an opinion on this?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 0.90 vs 1.X - Differing behavior for device # during fail/remove/add operation
  2013-05-11  0:04     ` Dusty Mabe
  2013-05-14 20:31       ` Dusty Mabe
@ 2013-05-16  0:41       ` NeilBrown
  2013-05-16  4:14         ` Dusty Mabe
  1 sibling, 1 reply; 7+ messages in thread
From: NeilBrown @ 2013-05-16  0:41 UTC (permalink / raw)
  To: Dusty Mabe; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2120 bytes --]

On Fri, 10 May 2013 20:04:27 -0400 Dusty Mabe <dustymabe@gmail.com> wrote:

> On Thu, May 9, 2013 at 5:29 PM, NeilBrown <neilb@suse.de> wrote:
> >
> > It is an unfortunate consequence of incoherent design.
> > I've occasionally wondered if I should "fix" it.
> 
> Neil,
> 
> Thanks for the insight. I don't know the code well but I do notice
> that changing the md_seq_show() function to print out the raid_disk
> rather than desc_nr at least gives me the "desired" behavior from
> /proc/mdstat. This still doesn't change the fact that the "Number" is
> still changed in the mdadm --detail output but is a quick easy way to
> change mdstat without actually having to re-architect anything.
> 
> What do you think?
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index aeceedf..b47fd35 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7023,7 +7023,7 @@ static int md_seq_show(struct seq_file *seq, void *v)
>                 rdev_for_each(rdev, mddev) {
>                         char b[BDEVNAME_SIZE];
>                         seq_printf(seq, " %s[%d]",
> -                               bdevname(rdev->bdev,b), rdev->desc_nr);
> +                               bdevname(rdev->bdev,b), rdev->raid_disk);
>                         if (test_bit(WriteMostly, &rdev->flags))
>                                 seq_printf(seq, "(W)");
>                         if (test_bit(Faulty, &rdev->flags)) {

The problem with doing this is that it is potentially an API change.
It is unlikely but possible that some script depends on the current meaning
of the number.

Also it would result in spares being reported as e.g.
     sda1[-1]S
as 'raid_disk' for a spare is '-1'.


My leaning is to not worry too much about /proc/mdstat, but instead add a
"--status" option to "mdadm" which prints out a summary similar
to /proc/mdstat, but more coherent and less full of noise.

mdadm --status

md0 : raid1 chunk=65536K bitmap_chunk=8KB metadata=1.2 size=976762496K
   Working: 3[U_U] sda[0] sdc[1]
   Spares: sdd
   Failed: sdb

or something like that.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 0.90 vs 1.X - Differing behavior for device # during fail/remove/add operation
  2013-05-16  0:41       ` NeilBrown
@ 2013-05-16  4:14         ` Dusty Mabe
  0 siblings, 0 replies; 7+ messages in thread
From: Dusty Mabe @ 2013-05-16  4:14 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, May 15, 2013 at 8:41 PM, NeilBrown <neilb@suse.de> wrote:

> The problem with doing this is that it is potentially an API change.
> It is unlikely but possible that some script depends on the current meaning
> of the number.
>
> Also it would result in spares being reported as e.g.
>      sda1[-1]S
> as 'raid_disk' for a spare is '-1'.
>

Excellent points. Yes it would be a sort of an API change which would
not be good for people who depend on it to act that way. That is sort
of how I ended up "caring so much" about it in the first place. 0.90
metadata behaved one way and now 1.X behaves differently so I need to
update some tools to take that into account. Before I updated the
tools I wanted to make sure it wasn't a "bug" and something that you
wanted to fix. If it was then I would possibly try to pull in the
kernel patch rather than change the tools.

>
> My leaning is to not worry too much about /proc/mdstat, but instead add a
> "--status" option to "mdadm" which prints out a summary similar
> to /proc/mdstat, but more coherent and less full of noise.
>

This looks nice but I wouldn't discount the value of /proc/mdstat. If
monitoring tools check the status of raid often then I would rather
them use file I/O of a file maintained by the kernel rather than
running a command each time and parsing output.

Thanks for collaborating with me on this. I'll update my tools.

Dusty Mabe

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-05-16  4:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-08 17:39 0.90 vs 1.X - Differing behavior for device # during fail/remove/add operation Dusty Mabe
2013-05-09 15:30 ` Dusty Mabe
2013-05-09 21:29   ` NeilBrown
2013-05-11  0:04     ` Dusty Mabe
2013-05-14 20:31       ` Dusty Mabe
2013-05-16  0:41       ` NeilBrown
2013-05-16  4:14         ` Dusty Mabe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).