mdadm 3.2.2: Behavioral change when adding back a previously faulted device

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mdadm 3.2.2: Behavioral change when adding back a previously faulted device
@ 2011-11-30 14:56 Martin Steigerwald
  2011-11-30 15:21 ` John Robinson
  0 siblings, 1 reply; 2+ messages in thread
From: Martin Steigerwald @ 2011-11-30 14:56 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Stefan Becker, linux-kernel

Hi Neil, hi Linux SoftRAID developers and users,

On preparing best practice / sample solution for some SoftRAID related 
exercises in one of our Linux courses I came about a behavorial change in 
mdadm that puzzled me. I use Linux 3.1.0 debian package.

I create a softraid 1 on logical volumes located on different SATA disks:

mdadm --create --level 1 --raid-devices 2 /dev/md3 /dev/mango1/raidtest 
/dev/mango2/raidtest

I let it sync and then set one disk faulty:

mdadm --manage --set-faulty /dev/md3 /dev/mango2/raidtest

mango:~# head -3 /proc/mdstat
Personalities : [raid1] 
md3 : active raid1 dm-7[1](F) dm-6[0]
      52427704 blocks super 1.2 [2/1] [U_]

Then I removed it:

mdadm /dev/md3 --remove failed

mango:~# head -3 /proc/mdstat
Personalities : [raid1] 
md3 : active raid1 dm-6[0]
      52427704 blocks super 1.2 [2/1] [U_]

And then I tried adding it again with:

mango:~# mdadm -vv /dev/md3 --add /dev/mango2/raidtest
mdadm: /dev/mango2/raidtest reports being an active member for /dev/md3, but a 
--re-add fails.
mdadm: not performing --add as that would convert /dev/mango2/raidtest in to a 
spare.
mdadm: To make this a spare, use "mdadm --zero-superblock 
/dev/mango2/raidtest" first.

This is how it works with mdadm upto 3.1.4 at least and how I know it. That 
said, considered that re-adding the device failed the error message makes some 
sense to me.

I tried explicitely to re-add it:

mango:~# mdadm -vv /dev/md3 --re-add /dev/mango2/raidtest
mdadm: --re-add for /dev/mango2/raidtest to /dev/md3 is not possible

Here mdadm fails to mention on why it is not able to re-add the device.

Here is what I find on syslog:

mango:~# tail -15 /var/log/syslog
Nov 30 15:50:06 mango kernel: [11146.968265] md/raid1:md3: Disk failure on 
dm-3, disabling device.
Nov 30 15:50:06 mango kernel: [11146.968268] md/raid1:md3: Operation 
continuing on 1 devices.
Nov 30 15:50:06 mango kernel: [11146.996597] RAID1 conf printout:
Nov 30 15:50:06 mango kernel: [11146.996603]  --- wd:1 rd:2
Nov 30 15:50:06 mango kernel: [11146.996608]  disk 0, wo:0, o:1, dev:dm-6
Nov 30 15:50:06 mango kernel: [11146.996612]  disk 1, wo:1, o:0, dev:dm-3
Nov 30 15:50:06 mango kernel: [11147.020032] RAID1 conf printout:
Nov 30 15:50:06 mango kernel: [11147.020037]  --- wd:1 rd:2
Nov 30 15:50:06 mango kernel: [11147.020042]  disk 0, wo:0, o:1, dev:dm-6
Nov 30 15:50:11 mango kernel: [11151.631376] md: unbind<dm-3>
Nov 30 15:50:11 mango kernel: [11151.644064] md: export_rdev(dm-3)
Nov 30 15:50:17 mango kernel: [11157.787979] md: export_rdev(dm-3)
Nov 30 15:50:22 mango kernel: [11162.531139] md: export_rdev(dm-3)
Nov 30 15:50:25 mango kernel: [11165.883082] md: export_rdev(dm-3)
Nov 30 15:51:04 mango kernel: [11204.723241] md: export_rdev(dm-3)

We tried tried it with metadata 0.90 but had the same behavior. Then we tried 
after downgrading mdadm to 3.1.4 and then mdadm --add just added the device as 
spare initially and then SoftRAID used it for recovery after it found that it 
needed another disk to make a RAID complete.

What works with mdadm 3.2.2 is to --zero-superblock the device and then --add 
it. Is that the recommended way to re-adding a device previously marked as 
faulty?

I bet the observed behavior might be party due to 

commit d6508f0cfb60edf07b36f1532eae4d9cddf7178b
Author: NeilBrown <neilb@suse.de>
Date:   Mon Nov 22 19:35:25 2010 +1100

    Manage:  be more careful about --add attempts.

    If an --add is requested and a re-add looks promising but fails or
    cannot possibly succeed, then don't try the add.  This avoids
    inadvertently turning devices into spares when an array is failed but
    the devices seem to actually work.

    Signed-off-by: NeilBrown <neilb@suse.de>

which I also found as commit 8453e704305b92f043e436d6f90a0c5f068b09eb in git 
log. But this doesn't explain why readding the device fails. Since the device 
was previously in this RAID array, should mdadm just be able to re-add it?

Now is not being able to --re-add the device a (security) feature or bug?

I understand that it might not be common to re-add a device previously marked 
as faulty, but aside from being useful in an exercise it can be useful if 
someone marked the wrong device as faulty accidentally.

Please advice.

Thanks,
-- 
Martin Steigerwald - teamix GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: mdadm 3.2.2: Behavioral change when adding back a previously faulted device
  2011-11-30 14:56 mdadm 3.2.2: Behavioral change when adding back a previously faulted device Martin Steigerwald
@ 2011-11-30 15:21 ` John Robinson
  0 siblings, 0 replies; 2+ messages in thread
From: John Robinson @ 2011-11-30 15:21 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Neil Brown, linux-raid, Stefan Becker

On 30/11/2011 14:56, Martin Steigerwald wrote:
> Hi Neil, hi Linux SoftRAID developers and users,
>
> On preparing best practice / sample solution for some SoftRAID related
> exercises in one of our Linux courses I came about a behavorial change in
> mdadm that puzzled me. I use Linux 3.1.0 debian package.
>
> I create a softraid 1 on logical volumes located on different SATA disks:
>
> mdadm --create --level 1 --raid-devices 2 /dev/md3 /dev/mango1/raidtest
> /dev/mango2/raidtest
>
> I let it sync and then set one disk faulty:
>
> mdadm --manage --set-faulty /dev/md3 /dev/mango2/raidtest
>
> mango:~# head -3 /proc/mdstat
> Personalities : [raid1]
> md3 : active raid1 dm-7[1](F) dm-6[0]
>        52427704 blocks super 1.2 [2/1] [U_]
>
>
> Then I removed it:
>
> mdadm /dev/md3 --remove failed
>
> mango:~# head -3 /proc/mdstat
> Personalities : [raid1]
> md3 : active raid1 dm-6[0]
>        52427704 blocks super 1.2 [2/1] [U_]
>
>
> And then I tried adding it again with:
>
> mango:~# mdadm -vv /dev/md3 --add /dev/mango2/raidtest
> mdadm: /dev/mango2/raidtest reports being an active member for /dev/md3, but a
> --re-add fails.
> mdadm: not performing --add as that would convert /dev/mango2/raidtest in to a
> spare.
> mdadm: To make this a spare, use "mdadm --zero-superblock
> /dev/mango2/raidtest" first.
>
>
> This is how it works with mdadm upto 3.1.4 at least and how I know it. That
> said, considered that re-adding the device failed the error message makes some
> sense to me.
>
>
> I tried explicitely to re-add it:
>
> mango:~# mdadm -vv /dev/md3 --re-add /dev/mango2/raidtest
> mdadm: --re-add for /dev/mango2/raidtest to /dev/md3 is not possible
>
> Here mdadm fails to mention on why it is not able to re-add the device.
>
>
> Here is what I find on syslog:
>
> mango:~# tail -15 /var/log/syslog
> Nov 30 15:50:06 mango kernel: [11146.968265] md/raid1:md3: Disk failure on
> dm-3, disabling device.
> Nov 30 15:50:06 mango kernel: [11146.968268] md/raid1:md3: Operation
> continuing on 1 devices.
> Nov 30 15:50:06 mango kernel: [11146.996597] RAID1 conf printout:
> Nov 30 15:50:06 mango kernel: [11146.996603]  --- wd:1 rd:2
> Nov 30 15:50:06 mango kernel: [11146.996608]  disk 0, wo:0, o:1, dev:dm-6
> Nov 30 15:50:06 mango kernel: [11146.996612]  disk 1, wo:1, o:0, dev:dm-3
> Nov 30 15:50:06 mango kernel: [11147.020032] RAID1 conf printout:
> Nov 30 15:50:06 mango kernel: [11147.020037]  --- wd:1 rd:2
> Nov 30 15:50:06 mango kernel: [11147.020042]  disk 0, wo:0, o:1, dev:dm-6
> Nov 30 15:50:11 mango kernel: [11151.631376] md: unbind<dm-3>
> Nov 30 15:50:11 mango kernel: [11151.644064] md: export_rdev(dm-3)
> Nov 30 15:50:17 mango kernel: [11157.787979] md: export_rdev(dm-3)
> Nov 30 15:50:22 mango kernel: [11162.531139] md: export_rdev(dm-3)
> Nov 30 15:50:25 mango kernel: [11165.883082] md: export_rdev(dm-3)
> Nov 30 15:51:04 mango kernel: [11204.723241] md: export_rdev(dm-3)
>
>
> We tried tried it with metadata 0.90 but had the same behavior. Then we tried
> after downgrading mdadm to 3.1.4 and then mdadm --add just added the device as
> spare initially and then SoftRAID used it for recovery after it found that it
> needed another disk to make a RAID complete.
>
> What works with mdadm 3.2.2 is to --zero-superblock the device and then --add
> it. Is that the recommended way to re-adding a device previously marked as
> faulty?
>
>
> I bet the observed behavior might be party due to
>
> commit d6508f0cfb60edf07b36f1532eae4d9cddf7178b
> Author: NeilBrown<neilb@suse.de>
> Date:   Mon Nov 22 19:35:25 2010 +1100
>
>      Manage:  be more careful about --add attempts.
>
>      If an --add is requested and a re-add looks promising but fails or
>      cannot possibly succeed, then don't try the add.  This avoids
>      inadvertently turning devices into spares when an array is failed but
>      the devices seem to actually work.
>
>      Signed-off-by: NeilBrown<neilb@suse.de>
>
> which I also found as commit 8453e704305b92f043e436d6f90a0c5f068b09eb in git
> log. But this doesn't explain why readding the device fails. Since the device
> was previously in this RAID array, should mdadm just be able to re-add it?
>
>
> Now is not being able to --re-add the device a (security) feature or bug?
>
> I understand that it might not be common to re-add a device previously marked
> as faulty, but aside from being useful in an exercise it can be useful if
> someone marked the wrong device as faulty accidentally.
>
> Please advice.

This is deliberate to stop people overwriting discs they want to recover 
data from, when they said --add where they should have used --re-add.

Reasons for --re-add to fail include the array you're re-adding to 
having been updated since the drive you're re-adding was set faulty. 
Arrays with a write intent bitmap can have devices re-added even if the 
array has been updated in the mean time and only the updates are 
applied, but without the write intent bitmap the whole disc needs to be 
resync'ed, which is what an --add would do, and is why mdadm is now more 
cautious.

I think I've got this right and I hope it helps!

Cheers,

John.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-11-30 15:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-30 14:56 mdadm 3.2.2: Behavioral change when adding back a previously faulted device Martin Steigerwald
2011-11-30 15:21 ` John Robinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).