* RAID 0 md device still active after pulled drive
@ 2008-10-18 0:29 thomas62186218
2008-10-20 0:35 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: thomas62186218 @ 2008-10-18 0:29 UTC (permalink / raw)
To: linux-raid
Hi All,
I have run into a most unusual behavior, where mdadm reports a RAID 0
array that is missing a drive as "Active".
Environment:
Ubuntu 8.0.4 Hardy 64-bit
mdadm: 2.6.7
Dual socket quad-core CPU Intel server
8GB RAM
8 SATA II drives
LSI SAS1068 controller
Scenario:
1) I have a RAID 0 created from two drives:
md2 : active raid0 sde1[1] sdd1[0]
488391680 blocks 128k chunks
mdadm -D /dev/md2
/dev/md2:
Version : 00.90
Creation Time : Fri Oct 17 14:24:44 2008
Raid Level : raid0
Array Size : 488391680 (465.77 GiB 500.11 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Fri Oct 17 14:24:44 2008
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 128K
Number Major Minor RaidDevice State
0 8 49 0 active sync
1 8 65 1 active sync
2) Then I monitor the md device.
mdadm --monitor -1 /dev/md2
3) Then I pull out a hard drive from the RAID 0 out of the system. At
this point, I expect md device to become inactive.
DeviceDisappeared on /dev/md2 Wrong-Level
4) Oddly, no difference is reported in /proc/mdstat:
md2 : active raid0 sde1[1] sdd1[0]
488391680 blocks 128k chunks
5) So I try to run IO, which fails (obviously).
mkfs /dev/md2
mke2fs 1.40.8 (13-Mar-2008)
Warning: could not erase sector 2: Attempt to write block from
filesystem resulted in short write
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
30531584 inodes, 122097920 blocks
6104896 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
3727 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632,
2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616,
78675968,
102400000
Warning: could not read block 0: Attempt to read block from filesystem
resulted in short read
Warning: could not erase sector 0: Attempt to write block from
filesystem resulted in short write
Writing inode tables: done
Writing superblocks and filesystem accounting information:
Warning, had trouble writing out superblocks.done
This filesystem will be automatically checked every 26 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
Conclusion: Why does mdadm report a drive failure on RAID 0 but not
make the md device as Inactive or otherwise failed?
Thanks!
-Thomas
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID 0 md device still active after pulled drive
2008-10-18 0:29 RAID 0 md device still active after pulled drive thomas62186218
@ 2008-10-20 0:35 ` Neil Brown
2008-10-20 0:56 ` thomas62186218
2008-10-20 9:21 ` Mario 'BitKoenig' Holbe
0 siblings, 2 replies; 4+ messages in thread
From: Neil Brown @ 2008-10-20 0:35 UTC (permalink / raw)
To: thomas62186218; +Cc: linux-raid
On Friday October 17, thomas62186218@aol.com wrote:
> Hi All,
>
> I have run into a most unusual behavior, where mdadm reports a RAID 0
> array that is missing a drive as "Active".
Not unusual at all. mdadm has always behaved this way.
There is nothing that 'md' can ever do about a failed drive in a
raid0, so it doesn't bother doing anything. At all.
As far as md is concerned, the drive is still an active part of the
array. It will still try to send appropriate IO requests to that
device. If they fail (e.g. because the device doesn't actually
exist), then md will send that error message back.
>
>
> Conclusion: Why does mdadm report a drive failure on RAID 0 but not
> make the md device as Inactive or otherwise failed?
where exactly did "mdadm report a drive failure" on the RAID0 ??
As always, if you think the documentation could be improved to reduce
the chance of this sort of confusion, or if the output of mdadm could
make something more clear, I am open to constructive suggestions (and
patches).
NeilBrown
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID 0 md device still active after pulled drive
2008-10-20 0:35 ` Neil Brown
@ 2008-10-20 0:56 ` thomas62186218
2008-10-20 9:21 ` Mario 'BitKoenig' Holbe
1 sibling, 0 replies; 4+ messages in thread
From: thomas62186218 @ 2008-10-20 0:56 UTC (permalink / raw)
To: neilb; +Cc: linux-raid
Hi Neil,
Thank you for the response and clarification. Regarding where mdadm
reported a drive failure, it came from mdadm --monitor (see points 2
and 3 in my original email, pasted below):
------------------------
2) Then I monitor the md device.
mdadm --monitor -1 /dev/md2
3) Then I pull out a hard drive from the RAID 0 out of the system. At
this point, I expect md device to become inactive.
DeviceDisappeared on /dev/md2 Wrong-Level
------------------------
So, mdadm acknowledges a device has disappeared from the RAID 0 md
(/dev/md2). This of course means the RAID 0 must be inactive. So, there
is a mismatch here in mdadm's reporting. It reports the drive failure,
but does not report the corresponding RAID 0 failure.
For consistency, my recommendation would be therefore be to have mdadm
report the array inactive at this point as well. Perhaps this is a
special case for RAID 0, but I also think it is worth it to have mdadm
try a little harder to report correctly, rather than not acknowledging
the md failure at all.
Best regards,
-Thomas
-----Original Message-----
From: Neil Brown <neilb@suse.de>
To: thomas62186218@aol.com
Cc: linux-raid@vger.kernel.org
Sent: Sun, 19 Oct 2008 5:35 pm
Subject: Re: RAID 0 md device still active after pulled drive
On Friday October 17, thomas62186218@aol.com wrote:
> Hi All,
>
> I have run into a most unusual behavior, where mdadm reports a RAID 0
> array that is missing a drive as "Active".
Not unusual at all. mdadm has always behaved this way.
There is nothing that 'md' can ever do about a failed drive in a
raid0, so it doesn't bother doing anything. At all.
As far as md is concerned, the drive is still an active part of the
array. It will still try to send appropriate IO requests to that
device. If they fail (e.g. because the device doesn't actually
exist), then md will send that error message back.
>
>
> Conclusion: Why does mdadm report a drive failure on RAID 0 but not
> make the md device as Inactive or otherwise failed?
where exactly did "mdadm report a drive failure" on the RAID0 ??
As always, if you think the documentation could be improved to reduce
the chance of this sort of confusion, or if the output of mdadm could
make something more clear, I am open to constructive suggestions (and
patches).
NeilBrown
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID 0 md device still active after pulled drive
2008-10-20 0:35 ` Neil Brown
2008-10-20 0:56 ` thomas62186218
@ 2008-10-20 9:21 ` Mario 'BitKoenig' Holbe
1 sibling, 0 replies; 4+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2008-10-20 9:21 UTC (permalink / raw)
To: linux-raid
Neil Brown <neilb@suse.de> wrote:
> There is nothing that 'md' can ever do about a failed drive in a
> raid0, so it doesn't bother doing anything. At all.
Hmmm, would it probably make sense to switch to a fail-stop semantic to
reduce the chance of damaging the rest of the data?
On the other hand... perhaps this is a filesystem-thing to do like
errors=remount-ro does it on ext2.
regards
Mario
--
File names are infinite in length where infinity is set to 255 characters.
-- Peter Collinson, "The Unix File System"
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-10-20 9:21 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-18 0:29 RAID 0 md device still active after pulled drive thomas62186218
2008-10-20 0:35 ` Neil Brown
2008-10-20 0:56 ` thomas62186218
2008-10-20 9:21 ` Mario 'BitKoenig' Holbe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).