All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Piotr Legiecki <piotrlg@pum.edu.pl>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID10 failed with two disks
Date: Mon, 22 Aug 2011 22:01:29 +1000	[thread overview]
Message-ID: <20110822220129.5b2928ff@notabene.brown> (raw)
In-Reply-To: <4E5240BE.1050807@pum.edu.pl>

On Mon, 22 Aug 2011 13:42:54 +0200 Piotr Legiecki <piotrlg@pum.edu.pl> wrote:

> >> mdadm --examine /dev/sda1
> >> /dev/sda1:
> >>            Magic : a92b4efc
> >>          Version : 00.90.00
> >>             UUID : fab2336d:71210520:990002ab:4fde9f0c (local to host bez)
> >>    Creation Time : Mon Aug 22 10:40:36 2011
> >>       Raid Level : raid10
> >>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> >>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
> >>     Raid Devices : 4
> >>    Total Devices : 4
> >> Preferred Minor : 4
> >>
> >>      Update Time : Mon Aug 22 10:40:36 2011
> >>            State : clean
> >>   Active Devices : 2
> >> Working Devices : 2
> >>   Failed Devices : 2
> >>    Spare Devices : 0
> >>         Checksum : d4ba8390 - correct
> >>           Events : 1
> >>
> >>           Layout : near=2, far=1
> >>       Chunk Size : 64K
> >>
> >>        Number   Major   Minor   RaidDevice State
> >> this     0       8        1        0      active sync   /dev/sda1
> >>
> >>     0     0       8        1        0      active sync   /dev/sda1
> >>     1     1       8       17        1      active sync   /dev/sdb1
> >>     2     2       0        0        2      faulty
> >>     3     3       0        0        3      faulty
> >>
> >> The last two disks (failed ones) are sde1 and sdf1.
> >>
> >> So do I have any chances to get the array running or it is dead?
> > 
> > Possible.
> > Report "mdadm --examine" of all devices that you believe should be part of
> > the array.
> 
> /dev/sdb1:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : fab2336d:71210520:990002ab:4fde9f0c (local to host bez)
>    Creation Time : Mon Aug 22 10:40:36 2011
>       Raid Level : raid10
>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 4
> 
>      Update Time : Mon Aug 22 10:40:36 2011
>            State : clean
>   Active Devices : 2
> Working Devices : 2
>   Failed Devices : 2
>    Spare Devices : 0
>         Checksum : d4ba83a2 - correct
>           Events : 1
> 
>           Layout : near=2, far=1
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     1       8       17        1      active sync   /dev/sdb1
> 
>     0     0       8        1        0      active sync   /dev/sda1
>     1     1       8       17        1      active sync   /dev/sdb1
>     2     2       0        0        2      faulty
>     3     3       0        0        3      faulty
> 
> 
> 
> /dev/sde1:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : 157a7440:4502f6db:990002ab:4fde9f0c (local to host bez)
>    Creation Time : Fri Jun  3 12:18:33 2011
>       Raid Level : raid10
>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 4
> 
>      Update Time : Sat Aug 20 03:06:27 2011
>            State : clean
>   Active Devices : 4
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 0
>         Checksum : c2f848c2 - correct
>           Events : 24
> 
>           Layout : near=2, far=1
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     2       8       65        2      active sync   /dev/sde1
> 
>     0     0       8        1        0      active sync   /dev/sda1
>     1     1       8       17        1      active sync   /dev/sdb1
>     2     2       8       65        2      active sync   /dev/sde1
>     3     3       8       81        3      active sync   /dev/sdf1
> 
> /dev/sdf1:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : 157a7440:4502f6db:990002ab:4fde9f0c (local to host bez)
>    Creation Time : Fri Jun  3 12:18:33 2011
>       Raid Level : raid10
>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 4
> 
>      Update Time : Sat Aug 20 03:06:27 2011
>            State : clean
>   Active Devices : 4
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 0
>         Checksum : c2f848d4 - correct
>           Events : 24
> 
>           Layout : near=2, far=1
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     3       8       81        3      active sync   /dev/sdf1
> 
>     0     0       8        1        0      active sync   /dev/sda1
>     1     1       8       17        1      active sync   /dev/sdb1
>     2     2       8       65        2      active sync   /dev/sde1
>     3     3       8       81        3      active sync   /dev/sdf1

It looks like sde1 and sdf1 are unchanged since the "failure" which happened
shortly after 3am on Saturday.  So the data on them is probably good.

It looks like someone (you?) tried to create a new array on sda1 and sdb1
thus destroying the old metadata (but probably not the data).  I'm surprised
that mdadm would have let you create a RAID10 with just 2 devices...   Is
that what happened?  or something else?

Anyway it looks as though if you run the command:

  mdadm --create /dev/md4 -l10 -n4 -e 0.90 /dev/sd{a,b,e,d}1 --assume-clean

there is a reasonable change that /dev/md4 would have all your data.
You should then
   fsck -fn /dev/md4
to check that it is all OK.  If it is you can
   echo check > /sys/block/md4/md/sync_action
to check if the mirrors are consistent.  When it finished 
   cat /sys/block/md4/md/mismatch_cnt
will show '0' if all is consistent.

If it is not zero but a small number, you can feel safe doing
    echo repair > /sys/block/md4/md/sync_action
to fix it up.
If it is a big number.... that would be troubling.


> 
> 
> smartd reported the sde and sdf disks are failed, but after rebooting it 
> does not complain anymore.
> 
> You say adjacent disks must be healthy for RAID10. So in my situation I 
> have adjacent disks dead (sde and sdf). It does not look good.
> 
> And does layout (near, far etc) influence on this rule: adjacent disk 
> must be healthy?

I didn't say adjacent disks must be healthy.  Is said you cannot have
adjacent disks both failing.  This is not affected by near/far.
It is a bit more subtle than that though.  It is OK for 2nd and 3rd to both
fail.  But not 1st and 2nd or 3rd and 4th.

NeilBrown


> 
> 
> Regards
> P.


  reply	other threads:[~2011-08-22 12:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-22 10:39 RAID10 failed with two disks Piotr Legiecki
2011-08-22 11:09 ` NeilBrown
2011-08-22 11:42   ` Piotr Legiecki
2011-08-22 12:01     ` NeilBrown [this message]
2011-08-22 12:52       ` Piotr Legiecki
2011-08-22 23:56         ` NeilBrown
2011-08-23  8:35           ` Piotr Legiecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110822220129.5b2928ff@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=piotrlg@pum.edu.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.