mdadm --fail doesn't mark device as failed?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mdadm --fail doesn't mark device as failed?
@ 2012-11-21 16:17 Ross Boylan
  2012-11-21 16:53 ` Sebastian Riemer
  2012-11-22  4:40 ` NeilBrown
  0 siblings, 2 replies; 14+ messages in thread
From: Ross Boylan @ 2012-11-21 16:17 UTC (permalink / raw)
  To: linux-raid; +Cc: ross

After I failed and removed a partition, mdadm --examine seems to show
that partition is fine.

Perhaps related to this, I failed a partition and when I rebooted it
came up as the sole member of its RAID array.

Is this behavior expected?  Is there a way to make the failures more
convincing?

The drive sdb in the following excerpt does appear to be experiencing
hardware problems.  However, the failed partition that became the md on
reboot was on a drive without any reported problems.

<terminal>
markov:/# date; mdadm --examine -v /dev/sdb1
Wed Nov 21 07:56:32 PST 2012
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 313d5489:7869305b:c4290e12:319afc54
  Creation Time : Mon Dec 15 06:49:51 2008
     Raid Level : raid1
  Used Dev Size : 96256 (94.02 MiB 98.57 MB)
     Array Size : 96256 (94.02 MiB 98.57 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Wed Nov 21 07:37:14 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : e28ac7f6 - correct
         Events : 1696


      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
markov:/# mdadm --fail /dev/md1 /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md1
markov:/# mdadm --fail /dev/md0 /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0
markov:/# mdadm --remove /dev/md0 /dev/sdb1
mdadm: hot removed /dev/sdb1
markov:/# mdadm --remove /dev/md1 /dev/sdb3
mdadm: hot removed /dev/sdb3
markov:/# date; mdadm --examine -v /dev/sdb3
Wed Nov 21 07:57:54 PST 2012
/dev/sdb3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : b77027df:d6aa474a:c4290e12:319afc54
  Creation Time : Mon Dec 15 06:50:18 2008
     Raid Level : raid1
 Used Dev Size : 730523648 (696.68 GiB 748.06 GB)
     Array Size : 730523648 (696.68 GiB 748.06 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

    Update Time : Wed Nov 21 07:56:44 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : f322bc10 - correct
         Events : 5067056


      Number   Major   Minor   RaidDevice State
this     1       8       19        1      active sync   /dev/sdb3

   0     0       8        3        0      active sync   /dev/sda3
   1     1       8       19        1      active sync   /dev/sdb3
markov:/# mdadm -Q /dev/md0
/dev/md0: 94.00MiB raid1 2 devices, 0 spares. Use mdadm --detail for more detail.
markov:/# date; mdadm --detail /dev/md0
Wed Nov 21 07:58:56 PST 2012
/dev/md0:
        Version : 00.90
  Creation Time : Mon Dec 15 06:49:51 2008
     Raid Level : raid1
     Array Size : 96256 (94.02 MiB 98.57 MB)
  Used Dev Size : 96256 (94.02 MiB 98.57 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Nov 21 07:57:35 2012
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 313d5489:7869305b:c4290e12:319afc54
         Events : 0.1700

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0        1      removed
markov:/# mdadm --grow /dev/md0 -n 1 --force
</terminal>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-21 16:17 mdadm --fail doesn't mark device as failed? Ross Boylan
@ 2012-11-21 16:53 ` Sebastian Riemer
  2012-11-21 17:03   ` Ross Boylan
  2012-11-22  4:42   ` NeilBrown
  2012-11-22  4:40 ` NeilBrown
  1 sibling, 2 replies; 14+ messages in thread
From: Sebastian Riemer @ 2012-11-21 16:53 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

On 21.11.2012 17:17, Ross Boylan wrote:
> After I failed and removed a partition, mdadm --examine seems to show
> that partition is fine.
>
> Perhaps related to this, I failed a partition and when I rebooted it
> came up as the sole member of its RAID array.
>
> Is this behavior expected?  Is there a way to make the failures more
> convincing?

Yes, it is expected behavior. Without "mdadm --fail" you can't remove a
device from the array. If you stop the array with the failed device,
then the state is stored in the superblock.

There is a difference in the way mdadm does it and the sysfs method.
mdadm sends an ioctl to the kernel. With the sysfs command the faulty
state is stored immediately in the superblock.

# echo faulty > /sys/block/md0/md/dev-sdb1/state

If you reassemble that you'll get the message:
mdadm: device 0 in /dev/md0 has wrong state in superblock, but /dev/sdb1
seems ok

There is a limit of how many errors are allowed on the device (usually 20).

If you do the following additionally, your device won't be used for
assembly anymore.
# echo 20 > /sys/block/md0/md/dev-sdb1/errors

I guess this is related to: /sys/block/md0/md/max_read_errors.

> The drive sdb in the following excerpt does appear to be experiencing
> hardware problems.  However, the failed partition that became the md on
> reboot was on a drive without any reported problems.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-21 16:53 ` Sebastian Riemer
@ 2012-11-21 17:03   ` Ross Boylan
  2012-11-21 17:10     ` Sebastian Riemer
  2012-11-22  4:42   ` NeilBrown
  1 sibling, 1 reply; 14+ messages in thread
From: Ross Boylan @ 2012-11-21 17:03 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: ross, linux-raid

On Wed, 2012-11-21 at 17:53 +0100, Sebastian Riemer wrote:
> On 21.11.2012 17:17, Ross Boylan wrote:
> > After I failed and removed a partition, mdadm --examine seems to show
> > that partition is fine.
> >
> > Perhaps related to this, I failed a partition and when I rebooted it
> > came up as the sole member of its RAID array.
> >
> > Is this behavior expected?  Is there a way to make the failures more
> > convincing?
> 
> Yes, it is expected behavior. Without "mdadm --fail" you can't remove a
> device from the array. If you stop the array with the failed device,
> then the state is stored in the superblock.
I'm confused.  I did run mdadm --fail.  Are you saying that, in addition
to doing that, I also need to manipulate sysfs as you describe below?
Or were you assuming I didn't mdadm --fail?

Ross
> 
> There is a difference in the way mdadm does it and the sysfs method.
> mdadm sends an ioctl to the kernel. With the sysfs command the faulty
> state is stored immediately in the superblock.
> 
> # echo faulty > /sys/block/md0/md/dev-sdb1/state
> 
> If you reassemble that you'll get the message:
> mdadm: device 0 in /dev/md0 has wrong state in superblock, but /dev/sdb1
> seems ok
> 
> There is a limit of how many errors are allowed on the device (usually 20).
> 
> If you do the following additionally, your device won't be used for
> assembly anymore.
> # echo 20 > /sys/block/md0/md/dev-sdb1/errors
> 
> I guess this is related to: /sys/block/md0/md/max_read_errors.
> 
> > The drive sdb in the following excerpt does appear to be experiencing
> > hardware problems.  However, the failed partition that became the md on
> > reboot was on a drive without any reported problems.
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-21 17:03   ` Ross Boylan
@ 2012-11-21 17:10     ` Sebastian Riemer
  2012-11-21 17:23       ` Ross Boylan
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Riemer @ 2012-11-21 17:10 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

On 21.11.2012 18:03, Ross Boylan wrote:
> On Wed, 2012-11-21 at 17:53 +0100, Sebastian Riemer wrote:
>> On 21.11.2012 17:17, Ross Boylan wrote:
>>> After I failed and removed a partition, mdadm --examine seems to show
>>> that partition is fine.
>>>
>>> Perhaps related to this, I failed a partition and when I rebooted it
>>> came up as the sole member of its RAID array.
>>>
>>> Is this behavior expected?  Is there a way to make the failures more
>>> convincing?
>> Yes, it is expected behavior. Without "mdadm --fail" you can't remove a
>> device from the array. If you stop the array with the failed device,
>> then the state is stored in the superblock.
> I'm confused.  I did run mdadm --fail.  Are you saying that, in addition
> to doing that, I also need to manipulate sysfs as you describe below?
> Or were you assuming I didn't mdadm --fail?

You only need to set the value in the "errors" sysfs file additionally
to ensure that this device isn't used for assembly anymore.

The kernel reports in "dmesg" then:
md: kicking non-fresh sdb1 from array!

>> There is a difference in the way mdadm does it and the sysfs method.
>> mdadm sends an ioctl to the kernel. With the sysfs command the faulty
>> state is stored immediately in the superblock.
>>
>> # echo faulty > /sys/block/md0/md/dev-sdb1/state
>>
>> If you reassemble that you'll get the message:
>> mdadm: device 0 in /dev/md0 has wrong state in superblock, but /dev/sdb1
>> seems ok
>>
>> There is a limit of how many errors are allowed on the device (usually 20).
>>
>> If you do the following additionally, your device won't be used for
>> assembly anymore.
>> # echo 20 > /sys/block/md0/md/dev-sdb1/errors
>>
>> I guess this is related to: /sys/block/md0/md/max_read_errors.
>>
>>> The drive sdb in the following excerpt does appear to be experiencing
>>> hardware problems.  However, the failed partition that became the md on
>>> reboot was on a drive without any reported problems.
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Sebastian Riemer
Linux Kernel Developer - Storage

We are looking for (SENIOR) LINUX KERNEL DEVELOPERS!

ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany
www.profitbricks.com • sebastian.riemer@profitbricks.com
Tel.: +49 - 30 - 60 98 56 991 - 915

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Andreas Gauger, Achim Weiss

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-21 17:10     ` Sebastian Riemer
@ 2012-11-21 17:23       ` Ross Boylan
  2012-11-21 17:47         ` Sebastian Riemer
  0 siblings, 1 reply; 14+ messages in thread
From: Ross Boylan @ 2012-11-21 17:23 UTC (permalink / raw)
  To: linux-raid, Sebastian Riemer; +Cc: ross

On Wed, 2012-11-21 at 18:10 +0100, Sebastian Riemer wrote:
> On 21.11.2012 18:03, Ross Boylan wrote:
> > On Wed, 2012-11-21 at 17:53 +0100, Sebastian Riemer wrote:
> >> On 21.11.2012 17:17, Ross Boylan wrote:
> >>> After I failed and removed a partition, mdadm --examine seems to show
> >>> that partition is fine.
> >>>
> >>> Perhaps related to this, I failed a partition and when I rebooted it
> >>> came up as the sole member of its RAID array.
> >>>
> >>> Is this behavior expected?  Is there a way to make the failures more
> >>> convincing?
> >> Yes, it is expected behavior. Without "mdadm --fail" you can't remove a
> >> device from the array. If you stop the array with the failed device,
> >> then the state is stored in the superblock.
> > I'm confused.  I did run mdadm --fail.  Are you saying that, in addition
> > to doing that, I also need to manipulate sysfs as you describe below?
> > Or were you assuming I didn't mdadm --fail?
> 
> You only need to set the value in the "errors" sysfs file additionally
> to ensure that this device isn't used for assembly anymore.
> 
> The kernel reports in "dmesg" then:
> md: kicking non-fresh sdb1 from array!
> 
OK.  So if I understand correctly, mdadm -fail has no effect that
persists past a reboot, and doesn't write to disk anything that would
prevent the use of the failed RAID component.(*)  But if I write to
sysfs, the failure wil persist across reboots.

This behavior is quite surprising to me.  Is there some reason for this
design?

Ross

(*) Also the different update or last use times either aren't recorded
or don't affect the RAID assembly decision.  For example, in my case md1
included sda3 and sdc3.  I failed sdc3, so that only sda3 had the most
current data.  But when the system rebooted, md1 was assembled from sdc3
only.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-21 17:23       ` Ross Boylan
@ 2012-11-21 17:47         ` Sebastian Riemer
  2012-11-21 19:41           ` Ross Boylan
  2012-11-21 19:52           ` Ross Boylan
  0 siblings, 2 replies; 14+ messages in thread
From: Sebastian Riemer @ 2012-11-21 17:47 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

On 21.11.2012 18:23, Ross Boylan wrote:
> On Wed, 2012-11-21 at 18:10 +0100, Sebastian Riemer wrote:
>> On 21.11.2012 18:03, Ross Boylan wrote:
>>> On Wed, 2012-11-21 at 17:53 +0100, Sebastian Riemer wrote:
>>>> On 21.11.2012 17:17, Ross Boylan wrote:
>>>>> After I failed and removed a partition, mdadm --examine seems to show
>>>>> that partition is fine.
>>>>>
>>>>> Perhaps related to this, I failed a partition and when I rebooted it
>>>>> came up as the sole member of its RAID array.
>>>>>
>>>>> Is this behavior expected?  Is there a way to make the failures more
>>>>> convincing?
>>>> Yes, it is expected behavior. Without "mdadm --fail" you can't remove a
>>>> device from the array. If you stop the array with the failed device,
>>>> then the state is stored in the superblock.
>>> I'm confused.  I did run mdadm --fail.  Are you saying that, in addition
>>> to doing that, I also need to manipulate sysfs as you describe below?
>>> Or were you assuming I didn't mdadm --fail?
>> You only need to set the value in the "errors" sysfs file additionally
>> to ensure that this device isn't used for assembly anymore.
>>
>> The kernel reports in "dmesg" then:
>> md: kicking non-fresh sdb1 from array!
>>
> OK.  So if I understand correctly, mdadm -fail has no effect that
> persists past a reboot, and doesn't write to disk anything that would
> prevent the use of the failed RAID component.(*)  But if I write to
> sysfs, the failure wil persist across reboots.
>
> This behavior is quite surprising to me.  Is there some reason for this
> design?

Yes, sometimes hardware has only a short issue and operates as expected
afterwards. Therefore, there is an error threshold. It could be very
annoying to zero the superblock and to resync everything only because
there was a short controller issue or something similar. Without this
you also couldn't remove and re-add devices for testing.

> (*) Also the different update or last use times either aren't recorded
> or don't affect the RAID assembly decision.  For example, in my case md1
> included sda3 and sdc3.  I failed sdc3, so that only sda3 had the most
> current data.  But when the system rebooted, md1 was assembled from sdc3
> only.

This is not the expected behavior. The superblock (at least metadata
1.2) has an update timestamp "utime". If something changes the
superblock on the remaining device only, it is clear that this device
has the most current data.
I'm not sure if this really works for your kernel and mdadm. Ask Neil
Brown for further details.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-21 17:47         ` Sebastian Riemer
@ 2012-11-21 19:41           ` Ross Boylan
  2012-11-22  9:43             ` Sebastian Riemer
  2012-11-21 19:52           ` Ross Boylan
  1 sibling, 1 reply; 14+ messages in thread
From: Ross Boylan @ 2012-11-21 19:41 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: ross, linux-raid

On Wed, 2012-11-21 at 18:47 +0100, Sebastian Riemer wrote:
> On 21.11.2012 18:23, Ross Boylan wrote:
> > On Wed, 2012-11-21 at 18:10 +0100, Sebastian Riemer wrote:
> >> On 21.11.2012 18:03, Ross Boylan wrote:
> >>> On Wed, 2012-11-21 at 17:53 +0100, Sebastian Riemer wrote:
> >>>> On 21.11.2012 17:17, Ross Boylan wrote:
> >>>>> After I failed and removed a partition, mdadm --examine seems to show
> >>>>> that partition is fine.
> >>>>>
> >>>>> Perhaps related to this, I failed a partition and when I rebooted it
> >>>>> came up as the sole member of its RAID array.
> >>>>>
> >>>>> Is this behavior expected?  Is there a way to make the failures more
> >>>>> convincing?
> >>>> Yes, it is expected behavior. Without "mdadm --fail" you can't remove a
> >>>> device from the array. If you stop the array with the failed device,
> >>>> then the state is stored in the superblock.
> >>> I'm confused.  I did run mdadm --fail.  Are you saying that, in addition
> >>> to doing that, I also need to manipulate sysfs as you describe below?
> >>> Or were you assuming I didn't mdadm --fail?
> >> You only need to set the value in the "errors" sysfs file additionally
> >> to ensure that this device isn't used for assembly anymore.
> >>
> >> The kernel reports in "dmesg" then:
> >> md: kicking non-fresh sdb1 from array!
> >>
> > OK.  So if I understand correctly, mdadm -fail has no effect that
> > persists past a reboot, and doesn't write to disk anything that would
> > prevent the use of the failed RAID component.(*)  But if I write to
> > sysfs, the failure wil persist across reboots.
> >
> > This behavior is quite surprising to me.  Is there some reason for this
> > design?
> 
> Yes, sometimes hardware has only a short issue and operates as expected
> afterwards. Therefore, there is an error threshold. It could be very
> annoying to zero the superblock and to resync everything only because
> there was a short controller issue or something similar. Without this
> you also couldn't remove and re-add devices for testing.
So if my intention is to remove the "device" (in this case, partition)
across reboots is using sysfs as you indicated sufficient?  Zeroing the
superblock (--zero-superblock)? Removing the device (mdadm --remove)?

In this particular case the partition was fine, and my thought was I
might add it back later.  But since the info would be dated, I guess
there was no real benefit to preserving the superblock.  I did want to
preserve the data in case things went catastrophically wrong.
> 
> > (*) Also the different update or last use times either aren't recorded
> > or don't affect the RAID assembly decision.  For example, in my case md1
> > included sda3 and sdc3.  I failed sdc3, so that only sda3 had the most
> > current data.  But when the system rebooted, md1 was assembled from sdc3
> > only.
> 
> This is not the expected behavior. The superblock (at least metadata
> 1.2) has an update timestamp "utime". If something changes the
> superblock on the remaining device only, it is clear that this device
> has the most current data.
> I'm not sure if this really works for your kernel and mdadm. Ask Neil
> Brown for further details.
These were 0.90 format disks; the --detail report does include an update
time.

Maybe the "right" md array was considered unbootable and it failed over
to the other one?
At the time I failed sdc3, it was in the md1 array that had sda3 and
sdc3, size 2.
When I rebooted md1 was sda3, sdd4, and sde4, size 3 (+1 spare, I think,
for the failed sdc3).  If the GPT disk partitions were not visible, sdd4
and sde4 would have been unavailable, so the choice would have been
bringing up md1 with 1 of 3 devices, sda3, or md1 with sdc3, one of 2
devices.  At least it didn't try to put sda3 and sdc3 together.

The "invisible GPT" theory fits what I saw with the Knoppix 6
environment, but it does not fit the fact that md0 came up with sda1 and
sdd2 and sdd2 is a GPT partition the first time I booted in Debian.

Thanks for helping me out with this.
Ross


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-21 17:47         ` Sebastian Riemer
  2012-11-21 19:41           ` Ross Boylan
@ 2012-11-21 19:52           ` Ross Boylan
  1 sibling, 0 replies; 14+ messages in thread
From: Ross Boylan @ 2012-11-21 19:52 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: ross, linux-raid

On Wed, 2012-11-21 at 18:47 +0100, Sebastian Riemer wrote:
> > OK.  So if I understand correctly, mdadm -fail has no effect that
> > persists past a reboot, and doesn't write to disk anything that
> would
> > prevent the use of the failed RAID component.(*)  But if I write to
> > sysfs, the failure wil persist across reboots.
> >
> > This behavior is quite surprising to me.  Is there some reason for
> this
> > design?
> 
> Yes, sometimes hardware has only a short issue and operates as
> expected
> afterwards. Therefore, there is an error threshold. It could be very
> annoying to zero the superblock and to resync everything only because
> there was a short controller issue or something similar. Without this
> you also couldn't remove and re-add devices for testing.
BTW, the part that was surprising was not that the device could be
re-added, but that it was re-added automatically on reboot.

At the moment I have quite a few partitions running around with the same
md UUID but slightly different information on them.
Ross


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-21 16:17 mdadm --fail doesn't mark device as failed? Ross Boylan
  2012-11-21 16:53 ` Sebastian Riemer
@ 2012-11-22  4:40 ` NeilBrown
  2012-11-23 23:58   ` Ross Boylan
  1 sibling, 1 reply; 14+ messages in thread
From: NeilBrown @ 2012-11-22  4:40 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1642 bytes --]

On Wed, 21 Nov 2012 08:17:57 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:

> After I failed and removed a partition, mdadm --examine seems to show
> that partition is fine.

Correct.  When a device fails it is assumed that it has failed and probably
cannot be written to.  So no attempt is made to write to it, so it will look
unchanged to --examine.

All the other devices in the array will record the fact that that device is
now faulty, and their event counts are increased so their idea of the status
of the various devices will take priority over the info stored on the faulty
device - should it still be readable.

> 
> Perhaps related to this, I failed a partition and when I rebooted it
> came up as the sole member of its RAID array.

This is a bug which is fixed in my mdadm development tree which will
eventually become mdadm-3.3.

Does the other decice get assembled into a different array, so you
end up with two arrays (split brain)?

What can happen is "mdadm --incremental /dev/whatever" is call on each device
and that results in the correct array (with non-failed device) being
assembled.
Then "mdadm -As" gets run and it sees the failed device and doesn't notice
the other array, so it assembles the failed device into an array of its own.

The fix causes "mdadm -As" to notice the arrays that "mdadm --incremental"
has created.

> 
> Is this behavior expected?  Is there a way to make the failures more
> convincing?

mdadm --zero /dev/whatever

after failing and removing the device.
Or unplug it and put in an acid bath - that makes failure pretty convincing.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-21 16:53 ` Sebastian Riemer
  2012-11-21 17:03   ` Ross Boylan
@ 2012-11-22  4:42   ` NeilBrown
  1 sibling, 0 replies; 14+ messages in thread
From: NeilBrown @ 2012-11-22  4:42 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: Ross Boylan, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1086 bytes --]

On Wed, 21 Nov 2012 17:53:58 +0100 Sebastian Riemer
<sebastian.riemer@profitbricks.com> wrote:

> On 21.11.2012 17:17, Ross Boylan wrote:
> > After I failed and removed a partition, mdadm --examine seems to show
> > that partition is fine.
> >
> > Perhaps related to this, I failed a partition and when I rebooted it
> > came up as the sole member of its RAID array.
> >
> > Is this behavior expected?  Is there a way to make the failures more
> > convincing?
> 
> Yes, it is expected behavior. Without "mdadm --fail" you can't remove a
> device from the array. If you stop the array with the failed device,
> then the state is stored in the superblock.
> 
> There is a difference in the way mdadm does it and the sysfs method.
> mdadm sends an ioctl to the kernel. With the sysfs command the faulty
> state is stored immediately in the superblock.
> 
> # echo faulty > /sys/block/md0/md/dev-sdb1/state
> 

This is not true.  "mdadm --fail" and "echo faulty > state" have exactly the
same effect on the array.  They simulate an error occurring.


NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-21 19:41           ` Ross Boylan
@ 2012-11-22  9:43             ` Sebastian Riemer
  2012-11-22 10:07               ` Sebastian Riemer
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Riemer @ 2012-11-22  9:43 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

On 21.11.2012 20:41, Ross Boylan wrote:
> On Wed, 2012-11-21 at 18:47 +0100, Sebastian Riemer wrote:
>
>> Yes, sometimes hardware has only a short issue and operates as expected
>> afterwards. Therefore, there is an error threshold. It could be very
>> annoying to zero the superblock and to resync everything only because
>> there was a short controller issue or something similar. Without this
>> you also couldn't remove and re-add devices for testing.
> So if my intention is to remove the "device" (in this case, partition)
> across reboots is using sysfs as you indicated sufficient? 

Yes, if you set a high number into sysfs file "errors", then you can
even keep the superblock but don't ask me how to revert this change. I
don't think that there is a "MakeGood" command.

> Zeroing the superblock (--zero-superblock)?

That's the alternative but you loose superblock data.

>  Removing the device (mdadm --remove)?

Here you need one of the methods above additionally.

> In this particular case the partition was fine, and my thought was I
> might add it back later.  But since the info would be dated, I guess
> there was no real benefit to preserving the superblock.  I did want to
> preserve the data in case things went catastrophically wrong.

You don't really have a benefit of keeping the superblock. The only
useful information is to which device it belonged to. In general you
replace the failed drive and the new device is synced from the remaining
good drive. Without the superblock you can read the actual data anyway
starting from the data offset.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-22  9:43             ` Sebastian Riemer
@ 2012-11-22 10:07               ` Sebastian Riemer
  2012-11-24  0:29                 ` Ross Boylan
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Riemer @ 2012-11-22 10:07 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

On 22.11.2012 10:43, Sebastian Riemer wrote:
> On 21.11.2012 20:41, Ross Boylan wrote:
>> On Wed, 2012-11-21 at 18:47 +0100, Sebastian Riemer wrote:
>>
>>> Yes, sometimes hardware has only a short issue and operates as expected
>>> afterwards. Therefore, there is an error threshold. It could be very
>>> annoying to zero the superblock and to resync everything only because
>>> there was a short controller issue or something similar. Without this
>>> you also couldn't remove and re-add devices for testing.
>> So if my intention is to remove the "device" (in this case, partition)
>> across reboots is using sysfs as you indicated sufficient? 
> Yes, if you set a high number into sysfs file "errors", then you can
> even keep the superblock but don't ask me how to revert this change. I
> don't think that there is a "MakeGood" command.
>
>> Zeroing the superblock (--zero-superblock)?
> That's the alternative but you loose superblock data.
>
>>  Removing the device (mdadm --remove)?
> Here you need one of the methods above additionally.

Correction: This also tiggers that the device isn't assembled again
after setting it faulty.

There is a difference in --faulty, --stop and --faulty, --remove, --stop.

>> In this particular case the partition was fine, and my thought was I
>> might add it back later.  But since the info would be dated, I guess
>> there was no real benefit to preserving the superblock.  I did want to
>> preserve the data in case things went catastrophically wrong.
> You don't really have a benefit of keeping the superblock. The only
> useful information is to which device it belonged to. In general you
> replace the failed drive and the new device is synced from the remaining
> good drive. Without the superblock you can read the actual data anyway
> starting from the data offset.
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-22  4:40 ` NeilBrown
@ 2012-11-23 23:58   ` Ross Boylan
  0 siblings, 0 replies; 14+ messages in thread
From: Ross Boylan @ 2012-11-23 23:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: ross, linux-raid

On Thu, 2012-11-22 at 15:40 +1100, NeilBrown wrote:
> On Wed, 21 Nov 2012 08:17:57 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:
> 
> > After I failed and removed a partition, mdadm --examine seems to show
> > that partition is fine.
> 
> Correct.  When a device fails it is assumed that it has failed and probably
> cannot be written to.  So no attempt is made to write to it, so it will look
> unchanged to --examine.
> 
> All the other devices in the array will record the fact that that device is
> now faulty, and their event counts are increased so their idea of the status
> of the various devices will take priority over the info stored on the faulty
> device - should it still be readable.
> 
> > 
> > Perhaps related to this, I failed a partition and when I rebooted it
> > came up as the sole member of its RAID array.
> 
> This is a bug which is fixed in my mdadm development tree which will
> eventually become mdadm-3.3.
Could you say more about the bug, or point me to details?  The behavior
has me a bit spooked and worried about putting drives in my machine,
given that all my drives have partitions that participated in md0 and
md1 at various times.  If I knew exactly what triggered it I could
proceed more effectively, and less dangerously.

I guess I could break in the initrd code (e.g., break=init option when
the kernel is loaded) and check if things look OK before letting it
pivot to the real system.

Also, do the fixes involve any kernel-level code?

> 
> Does the other decice get assembled into a different array, so you
> end up with two arrays (split brain)?
No, if I understand the question.  md1 was originally sda3 and sdc3.
After failing sdc3 I added sdd4 and sde4, and grew md1 to use both new
drives.  When the system rebooted, md1 consisted of sdc3 only, and the
other paritions were left as partition.  In at least some boot
environments sdd4 and sde4 were not recognized by the kernel, presumably
because they were on GPT  disks.  I know the 2.6.32 kernel under Knoppix
6 did not recognize them; whether the first reboot, using the Debian
initrd and 2.6.32 kernel could see them I'm not sure.

Later I found the debian initrd would not activate the md devices if
they were missing any disks; given that, it's puzzling that it did
activate md1 with sdc3, since sdc3 thought it was in a 2 disk array.

> 
> What can happen is "mdadm --incremental /dev/whatever" is call on each device
> and that results in the correct array (with non-failed device) being
> assembled.
> Then "mdadm -As" gets run and it sees the failed device and doesn't notice
> the other array, so it assembles the failed device into an array of its own.
> 
> The fix causes "mdadm -As" to notice the arrays that "mdadm --incremental"
> has created.
> 
> 
> > 
> > Is this behavior expected?  Is there a way to make the failures more
> > convincing?
> 
> mdadm --zero /dev/whatever
Is zeroing the superblock sufficient?  I'd like to preserve the data.
> 
> after failing and removing the device.
> Or unplug it and put in an acid bath - that makes failure pretty convincing.
> 
> NeilBrown


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mdadm --fail doesn't mark device as failed?
  2012-11-22 10:07               ` Sebastian Riemer
@ 2012-11-24  0:29                 ` Ross Boylan
  0 siblings, 0 replies; 14+ messages in thread
From: Ross Boylan @ 2012-11-24  0:29 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: ross, linux-raid

On Thu, 2012-11-22 at 11:07 +0100, Sebastian Riemer wrote:
> On 22.11.2012 10:43, Sebastian Riemer wrote:
> > On 21.11.2012 20:41, Ross Boylan wrote:
> >> On Wed, 2012-11-21 at 18:47 +0100, Sebastian Riemer wrote:
> >>
> >>> Yes, sometimes hardware has only a short issue and operates as expected
> >>> afterwards. Therefore, there is an error threshold. It could be very
> >>> annoying to zero the superblock and to resync everything only because
> >>> there was a short controller issue or something similar. Without this
> >>> you also couldn't remove and re-add devices for testing.
> >> So if my intention is to remove the "device" (in this case, partition)
> >> across reboots is using sysfs as you indicated sufficient? 
> > Yes, if you set a high number into sysfs file "errors", then you can
> > even keep the superblock but don't ask me how to revert this change. I
> > don't think that there is a "MakeGood" command.
> >
> >> Zeroing the superblock (--zero-superblock)?
> > That's the alternative but you loose superblock data.
> >
> >>  Removing the device (mdadm --remove)?
> > Here you need one of the methods above additionally.
> 
> Correction: This also tiggers that the device isn't assembled again
> after setting it faulty.
By "the device" do you mean the md device, or the particular member of
the aray?  My goal is to remove the array member (sdc3) but keep the
array (md1).
> 
> There is a difference in --faulty, --stop and --faulty, --remove, --stop.
Since most of my system is on md1, -stop is not possible with the system
running.  I believe one is executed as it shuts down; I could also boot
to a rescue environment if issuing the --stop is important.

I think I've received 2 inconsistent pieces of information; you just
said that --fault, --remove, -stop will assure that the array doesn't
restart, while Neil said that when a device fails no attempt is made to
write to it:

> When a device fails it is assumed that it has failed and probably
> cannot be written to.  So no attempt is made to write to it, so it
> will look unchanged to --examine.

In principle all the statements could be true if --fail writes nothing
but later steps do, but that seems a strained reading of Neil's
statement.


Ross
> 
> >> In this particular case the partition was fine, and my thought was I
> >> might add it back later.  But since the info would be dated, I guess
> >> there was no real benefit to preserving the superblock.  I did want to
> >> preserve the data in case things went catastrophically wrong.
> > You don't really have a benefit of keeping the superblock. The only
> > useful information is to which device it belonged to. In general you
> > replace the failed drive and the new device is synced from the remaining
> > good drive. Without the superblock you can read the actual data anyway
> > starting from the data offset.
> >
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-11-24  0:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-21 16:17 mdadm --fail doesn't mark device as failed? Ross Boylan
2012-11-21 16:53 ` Sebastian Riemer
2012-11-21 17:03   ` Ross Boylan
2012-11-21 17:10     ` Sebastian Riemer
2012-11-21 17:23       ` Ross Boylan
2012-11-21 17:47         ` Sebastian Riemer
2012-11-21 19:41           ` Ross Boylan
2012-11-22  9:43             ` Sebastian Riemer
2012-11-22 10:07               ` Sebastian Riemer
2012-11-24  0:29                 ` Ross Boylan
2012-11-21 19:52           ` Ross Boylan
2012-11-22  4:42   ` NeilBrown
2012-11-22  4:40 ` NeilBrown
2012-11-23 23:58   ` Ross Boylan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).