linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RAID 0 md device still active after pulled drive
@ 2008-10-18  0:29 thomas62186218
  2008-10-20  0:35 ` Neil Brown
  0 siblings, 1 reply; 4+ messages in thread
From: thomas62186218 @ 2008-10-18  0:29 UTC (permalink / raw)
  To: linux-raid

Hi All,

I have run into a most unusual behavior, where mdadm reports a RAID 0 
array that is missing a drive as "Active".

Environment:
Ubuntu 8.0.4 Hardy 64-bit
mdadm: 2.6.7
Dual socket quad-core CPU Intel server
8GB RAM
8 SATA II drives
LSI SAS1068 controller

Scenario:

1) I have a RAID 0 created from two drives:

md2 : active raid0 sde1[1] sdd1[0]
      488391680 blocks 128k chunks

mdadm -D /dev/md2
/dev/md2:
        Version : 00.90
  Creation Time : Fri Oct 17 14:24:44 2008
     Raid Level : raid0
     Array Size : 488391680 (465.77 GiB 500.11 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Fri Oct 17 14:24:44 2008
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync
       1       8       65        1      active sync

2) Then I monitor the md device.

mdadm --monitor -1 /dev/md2

3) Then I pull out a hard drive from the RAID 0 out of the system. At 
this point, I expect md device to become inactive.

DeviceDisappeared on /dev/md2 Wrong-Level

4) Oddly, no difference is reported in /proc/mdstat:

md2 : active raid0 sde1[1] sdd1[0]
      488391680 blocks 128k chunks


5) So I try to run IO, which fails (obviously).

mkfs /dev/md2
mke2fs 1.40.8 (13-Mar-2008)
Warning: could not erase sector 2: Attempt to write block from 
filesystem resulted in short write
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
30531584 inodes, 122097920 blocks
6104896 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
3727 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 
2654208,
         4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 
78675968,
        102400000

Warning: could not read block 0: Attempt to read block from filesystem 
resulted in short read
Warning: could not erase sector 0: Attempt to write block from 
filesystem resulted in short write
Writing inode tables: done
Writing superblocks and filesystem accounting information:
Warning, had trouble writing out superblocks.done

This filesystem will be automatically checked every 26 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.


Conclusion: Why does mdadm report a drive failure on RAID 0 but not 
make the md device as Inactive or otherwise failed?


Thanks!
-Thomas


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RAID 0 md device still active after pulled drive
  2008-10-18  0:29 RAID 0 md device still active after pulled drive thomas62186218
@ 2008-10-20  0:35 ` Neil Brown
  2008-10-20  0:56   ` thomas62186218
  2008-10-20  9:21   ` Mario 'BitKoenig' Holbe
  0 siblings, 2 replies; 4+ messages in thread
From: Neil Brown @ 2008-10-20  0:35 UTC (permalink / raw)
  To: thomas62186218; +Cc: linux-raid

On Friday October 17, thomas62186218@aol.com wrote:
> Hi All,
> 
> I have run into a most unusual behavior, where mdadm reports a RAID 0 
> array that is missing a drive as "Active".

Not unusual at all.  mdadm has always behaved this way.

There is nothing that 'md' can ever do about a failed drive in a
raid0, so it doesn't bother doing anything.  At all.
As far as md is concerned, the drive is still an active part of the
array.  It will still try to send appropriate IO requests to that
device.  If they fail (e.g. because the device doesn't actually
exist), then md will send that error message back.
> 
> 
> Conclusion: Why does mdadm report a drive failure on RAID 0 but not 
> make the md device as Inactive or otherwise failed?

where exactly did "mdadm report a drive failure" on the RAID0 ??


As always,  if you think the documentation could be improved to reduce
the chance of this sort of confusion, or if the output of mdadm could
make something more clear, I am open to constructive suggestions (and
patches).

NeilBrown

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RAID 0 md device still active after pulled drive
  2008-10-20  0:35 ` Neil Brown
@ 2008-10-20  0:56   ` thomas62186218
  2008-10-20  9:21   ` Mario 'BitKoenig' Holbe
  1 sibling, 0 replies; 4+ messages in thread
From: thomas62186218 @ 2008-10-20  0:56 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid

Hi Neil,

Thank you for the response and clarification. Regarding where mdadm 
reported a drive failure, it came from mdadm --monitor (see points 2 
and 3 in my original email, pasted below):

------------------------
2) Then I monitor the md device.

mdadm --monitor -1 /dev/md2

3) Then I pull out a hard drive from the RAID 0 out of the system. At 
this point, I expect md device to become inactive.

DeviceDisappeared on /dev/md2 Wrong-Level
------------------------

So, mdadm acknowledges a device has disappeared from the RAID 0 md 
(/dev/md2). This of course means the RAID 0 must be inactive. So, there 
is a mismatch here in mdadm's reporting. It reports the drive failure, 
but does not report the corresponding RAID 0 failure.

For consistency, my recommendation would be therefore be to have mdadm 
report the array inactive at this point as well. Perhaps this is a 
special case for RAID 0, but I also think it is worth it to have mdadm 
try a little harder to report correctly, rather than not acknowledging 
the md failure at all.

Best regards,
-Thomas


-----Original Message-----
From: Neil Brown <neilb@suse.de>
To: thomas62186218@aol.com
Cc: linux-raid@vger.kernel.org
Sent: Sun, 19 Oct 2008 5:35 pm
Subject: Re: RAID 0 md device still active after pulled drive










On Friday October 17, thomas62186218@aol.com wrote:
> Hi All,
>
> I have run into a most unusual behavior, where mdadm reports a RAID 0
> array that is missing a drive as "Active".

Not unusual at all.  mdadm has always behaved this way.

There is nothing that 'md' can ever do about a failed drive in a
raid0, so it doesn't bother doing anything.  At all.
As far as md is concerned, the drive is still an active part of the
array.  It will still try to send appropriate IO requests to that
device.  If they fail (e.g. because the device doesn't actually
exist), then md will send that error message back.
>
>
> Conclusion: Why does mdadm report a drive failure on RAID 0 but not
> make the md device as Inactive or otherwise failed?

where exactly did "mdadm report a drive failure" on the RAID0 ??


As always,  if you think the documentation could be improved to reduce
the chance of this sort of confusion, or if the output of mdadm could
make something more clear, I am open to constructive suggestions (and
patches).

NeilBrown






^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RAID 0 md device still active after pulled drive
  2008-10-20  0:35 ` Neil Brown
  2008-10-20  0:56   ` thomas62186218
@ 2008-10-20  9:21   ` Mario 'BitKoenig' Holbe
  1 sibling, 0 replies; 4+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2008-10-20  9:21 UTC (permalink / raw)
  To: linux-raid

Neil Brown <neilb@suse.de> wrote:
> There is nothing that 'md' can ever do about a failed drive in a
> raid0, so it doesn't bother doing anything.  At all.

Hmmm, would it probably make sense to switch to a fail-stop semantic to
reduce the chance of damaging the rest of the data?
On the other hand... perhaps this is a filesystem-thing to do like
errors=remount-ro does it on ext2.


regards
   Mario
-- 
File names are infinite in length where infinity is set to 255 characters.
                                -- Peter Collinson, "The Unix File System"


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-10-20  9:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-18  0:29 RAID 0 md device still active after pulled drive thomas62186218
2008-10-20  0:35 ` Neil Brown
2008-10-20  0:56   ` thomas62186218
2008-10-20  9:21   ` Mario 'BitKoenig' Holbe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).