RAID 0 of Two RAID 5s Stays Up When Component RAID fails

Linux RAID subsystem development
 help / color / mirror / Atom feed

* RAID 0 of Two RAID 5s Stays Up When Component RAID fails
@ 2013-03-14  1:33 Joel Young
  2013-03-14  3:21 ` Chris Murphy
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Joel Young @ 2013-03-14  1:33 UTC (permalink / raw)
  To: linux-raid

What should the output be for the following sequence:

dd if=/dev/zero of=i1 bs=1M count=100
dd if=/dev/zero of=i2 bs=1M count=100
dd if=/dev/zero of=i3 bs=1M count=100
dd if=/dev/zero of=i4 bs=1M count=100
dd if=/dev/zero of=i5 bs=1M count=100
dd if=/dev/zero of=i6 bs=1M count=100
losetup /dev/loop1 i1
losetup /dev/loop2 i2
losetup /dev/loop3 i3
losetup /dev/loop4 i4
losetup /dev/loop5 i5
losetup /dev/loop6 i6
mdadm --create /dev/md0 --level=5 --raid-devices=3 \
    /dev/loop1 /dev/loop2 /dev/loop3
mdadm --create /dev/md1 --level=5 --raid-devices=3 \
    /dev/loop4 /dev/loop5 /dev/loop6
mdadm --create /dev/md2 --level=0 --raid-devices=2 \
    /dev/md0 /dev/md1
mdadm /dev/md0 --fail /dev/loop0
mdadm /dev/md0 --fail /dev/loop1
mdadm --detail /dev/md0
mdadm --detail /dev/md2
dd if=/dev/zero of=/dev/md2 bs=1M

Shouldn't /dev/md2 be failed at this point?  Shouldn't the 
dd get an error?  No error was reported.  /dev/md2 is writable.  
Where are the writes going?

What am I missing?

This is running on fedora 18.  We saw the problem with production arrays on
fedora 17.

mdadm version: mdadm - v3.2.6 - 25th October 2012

kernel:  3.7.9-201.fc18.x86_64

Thanks,

Joel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID 0 of Two RAID 5s Stays Up When Component RAID fails
  2013-03-14  1:33 RAID 0 of Two RAID 5s Stays Up When Component RAID fails Joel Young
@ 2013-03-14  3:21 ` Chris Murphy
  2013-03-14  4:02   ` Joel Young
  2013-03-14  4:03 ` Joel Young
  2013-03-18 23:57 ` Joel Young
  2 siblings, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2013-03-14  3:21 UTC (permalink / raw)
  To: Joel Young; +Cc: linux-raid


On Mar 13, 2013, at 7:33 PM, Joel Young <jdy@cryregarder.com> wrote:
> 
> mdadm /dev/md0 --fail /dev/loop0

loop0 isn't defined in any of your raids.


> mdadm /dev/md0 --fail /dev/loop1

This causes md0 to be degraded.

> mdadm --detail /dev/md0
> mdadm --detail /dev/md2
> dd if=/dev/zero of=/dev/md2 bs=1M
> 
> Shouldn't /dev/md2 be failed at this point?

No.


Chris Murphy


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID 0 of Two RAID 5s Stays Up When Component RAID fails
  2013-03-14  3:21 ` Chris Murphy
@ 2013-03-14  4:02   ` Joel Young
  0 siblings, 0 replies; 7+ messages in thread
From: Joel Young @ 2013-03-14  4:02 UTC (permalink / raw)
  To: linux-raid

Chris Murphy <lists <at> colorremedies.com> writes:
> On Mar 13, 2013, at 7:33 PM, Joel Young <jdy <at> cryregarder.com> wrote:
>> mdadm /dev/md0 --fail /dev/loop0
> 
> loop0 isn't defined in any of your raids.

typo when transcribing my history session.  There is no loop0.

mdadm /dev/md0 --fail /dev/loop1 
mdadm /dev/md0 --fail /dev/loop2

> > Shouldn't /dev/md2 be failed at this point?
> 
> No.

No.  But it was.  Sorry for the confusion with the typo.

Joel



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID 0 of Two RAID 5s Stays Up When Component RAID fails
  2013-03-14  1:33 RAID 0 of Two RAID 5s Stays Up When Component RAID fails Joel Young
  2013-03-14  3:21 ` Chris Murphy
@ 2013-03-14  4:03 ` Joel Young
  2013-03-14 18:09   ` Chris Murphy
  2013-03-18 23:57 ` Joel Young
  2 siblings, 1 reply; 7+ messages in thread
From: Joel Young @ 2013-03-14  4:03 UTC (permalink / raw)
  To: linux-raid

Joel Young <jdy <at> cryregarder.com> writes:
> mdadm /dev/md0 --fail /dev/loop0
> mdadm /dev/md0 --fail /dev/loop1

That is a typo.  It is:

mdadm /dev/md0 --fail /dev/loop1
mdadm /dev/md0 --fail /dev/loop2

which should be clear since there was no loop0 created.

Thanks for the on list and off list notifications of my sloppiness :-)

Joel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID 0 of Two RAID 5s Stays Up When Component RAID fails
  2013-03-14  4:03 ` Joel Young
@ 2013-03-14 18:09   ` Chris Murphy
  2013-03-14 22:13     ` Joel Young
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2013-03-14 18:09 UTC (permalink / raw)
  To: Joel Young; +Cc: linux-raid

On Mar 13, 2013, at 10:03 PM, Joel Young <jdy@cryregarder.com> wrote:

> Joel Young <jdy <at> cryregarder.com> writes:
>> mdadm /dev/md0 --fail /dev/loop0
>> mdadm /dev/md0 --fail /dev/loop1
> 
> That is a typo.  It is:
> 
> mdadm /dev/md0 --fail /dev/loop1
> mdadm /dev/md0 --fail /dev/loop2

In this case md0 is failed. And thus md2 is failed.

> dd if=/dev/zero of=/dev/md2 bs=1M

The block device is expected to still be there, I'm not sure what errors you should get, but if it were a real physical block device than a logical one, I'd expect there'd be messages in dmesg.

If you had a mounted file system you were trying to write to I think you'd get more immediate error messages in the shell, not just in dmesg.

Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID 0 of Two RAID 5s Stays Up When Component RAID fails
  2013-03-14 18:09   ` Chris Murphy
@ 2013-03-14 22:13     ` Joel Young
  0 siblings, 0 replies; 7+ messages in thread
From: Joel Young @ 2013-03-14 22:13 UTC (permalink / raw)
  To: linux-raid

Chris Murphy <lists <at> colorremedies.com> writes:

> 
> 
> On Mar 13, 2013, at 10:03 PM, Joel Young <jdy <at> cryregarder.com> wrote:

> > mdadm /dev/md0 --fail /dev/loop1
> > mdadm /dev/md0 --fail /dev/loop2
> 
> In this case md0 is failed. And thus md2 is failed.
> 

Yes md2 is broken, but it isn't failed according to:

[root@quickstep delme_images]# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Wed Mar 13 18:25:17 2013
     Raid Level : raid0
     Array Size : 406528 (397.07 MiB 416.28 MB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed Mar 13 18:25:17 2013
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : quickstep:2  (local to host quickstep)
           UUID : ca94a237:d63c25be:ff64fe0f:a41be44c
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       9        1        0      active sync   /dev/md0
       1       9        3        1      active sync   /dev/md1

In /var/log/messages I get a bunch of buffer I/O errors on the device and
a warning in drivers/md/raid5.c get_active_stripe+0x683/0x7a0 [raid456]()

Shouldn't md2 have automatically failed?  Shouldn't writes immediately
error out instead of pretending to complete?

Joel



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID 0 of Two RAID 5s Stays Up When Component RAID fails
  2013-03-14  1:33 RAID 0 of Two RAID 5s Stays Up When Component RAID fails Joel Young
  2013-03-14  3:21 ` Chris Murphy
  2013-03-14  4:03 ` Joel Young
@ 2013-03-18 23:57 ` Joel Young
  2 siblings, 0 replies; 7+ messages in thread
From: Joel Young @ 2013-03-18 23:57 UTC (permalink / raw)
  To: linux-raid

Joel Young <jdy <at> cryregarder.com> writes:

> mdadm /dev/md0 --fail /dev/loop0
> mdadm /dev/md0 --fail /dev/loop1

I typod this when mailing the list.  It should have been:
mdadm /dev/md0 --fail /dev/loop1
mdadm /dev/md0 --fail /dev/loop2

In an off-list discussion with Neil Brown, he pointed out that
just as we don't fail a drive when a sector goes bad, we don't
fail a raid-0 if a component fails.

At this time, I don't follow that reasoning, but I can work
with it.

Joel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-03-18 23:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-14  1:33 RAID 0 of Two RAID 5s Stays Up When Component RAID fails Joel Young
2013-03-14  3:21 ` Chris Murphy
2013-03-14  4:02   ` Joel Young
2013-03-14  4:03 ` Joel Young
2013-03-14 18:09   ` Chris Murphy
2013-03-14 22:13     ` Joel Young
2013-03-18 23:57 ` Joel Young

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox