Re: raid6 recovery

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: "Björn Englund" <be@smarteye.se>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid6 recovery
Date: Sat, 15 Jan 2011 08:52:51 +1100	[thread overview]
Message-ID: <20110115085251.77c8b03b@notabene.brown> (raw)
In-Reply-To: <4D3076DA.4020204@smarteye.se>

On Fri, 14 Jan 2011 17:16:26 +0100 Björn Englund <be@smarteye.se> wrote:

> Hi.
> 
> After a loss of communication with a drive in a 10 disk raid6 the disk
> was dropped out of the raid.
> 
> I added it again with
> mdadm /dev/md16 --add /dev/sdbq1
> 
> The array resynced and I used the xfs filesystem on top of the raid.
> 
> After a while I started noticing filesystem errors.
> 
> I did
> echo check > /sys/block/md16/md/sync_action
> 
> I got a lot of errors in /sys/block/md16/md/mismatch_cnt
> 
> I failed and removed the disk I added before from the array.
> 
> Did a check again (on the 9/10 array)
> echo check > /sys/block/md16/md/sync_action
> 
> No errors  /sys/block/md16/md/mismatch_cnt
> 
> Wiped the superblock from /dev/sdbq1 and added it again to the array.
> Let it finish resyncing.
> Did a check and once again a lot of errors.

That is obviously very bad.  After the recovery it may well report a large
number in mismatch_cnt, but if you then do a 'check' the number should go to
zero and stay there.

Did you interrupt the recovery at all, or did it run to completion without
any interference?   What kernel version are you using?

> 
> The drive now has slot 10 instead of slot 3 which it had before the
> first error.

This is normal.  When you wipes the superblock, md though it was a new device
and gave it a new number in the array.  It still filled the same role though.


> 
> Examining each device (see below) shows 11 slots and one failed?
> (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3) ?

These numbers are confusing, but they are correct and suggest the array is
whole and working.
Newer version of mdadm are less confusing.

I'm afraid I cannot suggest what the root problem is.  It seems like
something seriously wrong with IO to the device, but if that is the case you
would expect other errors...

NeilBrown


> 
> 
> Any idea what is going on?
> 
> mdadm --version
> mdadm - v2.6.9 - 10th March 2009
> 
> Centos 5.5
> 
> 
> mdadm -D /dev/md16
> /dev/md16:
>         Version : 1.01
>   Creation Time : Thu Nov 25 09:15:54 2010
>      Raid Level : raid6
>      Array Size : 7809792000 (7448.00 GiB 7997.23 GB)
>   Used Dev Size : 976224000 (931.00 GiB 999.65 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 16
>     Persistence : Superblock is persistent
> 
>     Update Time : Fri Jan 14 16:22:10 2011
>           State : clean
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
> 
>      Chunk Size : 256K
> 
>            Name : 16
>            UUID : fcd585d0:f2918552:7090d8da:532927c8
>          Events : 90
> 
>     Number   Major   Minor   RaidDevice State
>        0       8      145        0      active sync   /dev/sdj1
>        1      65        1        1      active sync   /dev/sdq1
>        2      65       17        2      active sync   /dev/sdr1
>       10      68       65        3      active sync   /dev/sdbq1
>        4      65       49        4      active sync   /dev/sdt1
>        5      65       65        5      active sync   /dev/sdu1
>        6      65      113        6      active sync   /dev/sdx1
>        7      65      129        7      active sync   /dev/sdy1
>        8      65       33        8      active sync   /dev/sds1
>        9      65      145        9      active sync   /dev/sdz1
> 
> 
> 
> mdadm -E /dev/sdj1
> /dev/sdj1:
>           Magic : a92b4efc
>         Version : 1.1
>     Feature Map : 0x0
>      Array UUID : fcd585d0:f2918552:7090d8da:532927c8
>            Name : 16
>   Creation Time : Thu Nov 25 09:15:54 2010
>      Raid Level : raid6
>    Raid Devices : 10
> 
>  Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
>      Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
>   Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
>     Data Offset : 264 sectors
>    Super Offset : 0 sectors
>           State : clean
>     Device UUID : 5db9c8f7:ce5b375e:757c53d0:04e89a06
> 
>     Update Time : Fri Jan 14 16:22:10 2011
>        Checksum : 1f17a675 - correct
>          Events : 90
> 
>      Chunk Size : 256K
> 
>     Array Slot : 0 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
>    Array State : Uuuuuuuuuu 1 failed
> 
> 
> 
> mdadm -E /dev/sdq1
> /dev/sdq1:
>           Magic : a92b4efc
>         Version : 1.1
>     Feature Map : 0x0
>      Array UUID : fcd585d0:f2918552:7090d8da:532927c8
>            Name : 16
>   Creation Time : Thu Nov 25 09:15:54 2010
>      Raid Level : raid6
>    Raid Devices : 10
> 
>  Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
>      Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
>   Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
>     Data Offset : 264 sectors
>    Super Offset : 0 sectors
>           State : clean
>     Device UUID : fb113255:fda391a6:7368a42b:1d6d4655
> 
>     Update Time : Fri Jan 14 16:22:10 2011
>        Checksum : 6ed7b859 - correct
>          Events : 90
> 
>      Chunk Size : 256K
> 
>     Array Slot : 1 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
>    Array State : uUuuuuuuuu 1 failed
> 
> 
>  mdadm -E /dev/sdr1
> /dev/sdr1:
>           Magic : a92b4efc
>         Version : 1.1
>     Feature Map : 0x0
>      Array UUID : fcd585d0:f2918552:7090d8da:532927c8
>            Name : 16
>   Creation Time : Thu Nov 25 09:15:54 2010
>      Raid Level : raid6
>    Raid Devices : 10
> 
>  Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
>      Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
>   Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
>     Data Offset : 264 sectors
>    Super Offset : 0 sectors
>           State : clean
>     Device UUID : afcb4dd8:2aa58944:40a32ed9:eb6178af
> 
>     Update Time : Fri Jan 14 16:22:10 2011
>        Checksum : 97a7a2d7 - correct
>          Events : 90
> 
>      Chunk Size : 256K
> 
>     Array Slot : 2 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
>    Array State : uuUuuuuuuu 1 failed
> 
> 
> mdadm -E /dev/sdbq1
> /dev/sdbq1:
>           Magic : a92b4efc
>         Version : 1.1
>     Feature Map : 0x0
>      Array UUID : fcd585d0:f2918552:7090d8da:532927c8
>            Name : 16
>   Creation Time : Thu Nov 25 09:15:54 2010
>      Raid Level : raid6
>    Raid Devices : 10
> 
>  Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
>      Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
>   Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
>     Data Offset : 264 sectors
>    Super Offset : 0 sectors
>           State : clean
>     Device UUID : 93c6ae7c:d8161356:7ada1043:d0c5a924
> 
>     Update Time : Fri Jan 14 16:22:10 2011
>        Checksum : 2ca5aa8f - correct
>          Events : 90
> 
>      Chunk Size : 256K
> 
>     Array Slot : 10 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
>    Array State : uuuUuuuuuu 1 failed
> 
> 
> and so on for the rest of the drives.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2011-01-14 21:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-14 16:16 raid6 recovery Björn Englund
2011-01-14 21:52 ` NeilBrown [this message]
  -- strict thread matches above, loose matches on Subject: below --
2020-03-19 19:55 Raid6 recovery Glenn Greibesland
2020-03-20 19:15 ` Wols Lists
     [not found]   ` <CA+9eyigMV-E=FwtXDWZszSsV6JOxxFOFVh6WzmeH=OC3heMUHw@mail.gmail.com>
2020-03-21  0:06     ` antlists
2020-03-21 11:54       ` Glenn Greibesland
2020-03-21 19:24         ` Phil Turmel
2020-03-21 22:12           ` Glenn Greibesland
2020-03-22  0:32             ` Phil Turmel
2020-03-23  9:23               ` Wols Lists
2020-03-23 12:35                 ` Glenn Greibesland
2020-03-22  0:05           ` Wols Lists
2009-01-15 15:24 raid6 recovery Jason Weber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110115085251.77c8b03b@notabene.brown \
    --to=neilb@suse.de \
    --cc=be@smarteye.se \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).