All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Kurt Schmitt <kurt_schmitt@gmx.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: Check after raid6 failure
Date: Thu, 14 Jun 2012 22:52:39 +1000	[thread overview]
Message-ID: <20120614225239.0dee7594@notabene.brown> (raw)
In-Reply-To: <20120614112955.286290@gmx.net>

[-- Attachment #1: Type: text/plain, Size: 10701 bytes --]

On Thu, 14 Jun 2012 13:29:55 +0200 "Kurt Schmitt" <kurt_schmitt@gmx.de> wrote:

> Hello,
> 
> I am running a raid6 with 8 drives (no spares) and I am recovering after a controller failure that removed 3 of the drives (ATA Bus error). The state of the raid after this is obvious:
> 
> md7 : active raid6 sdg1[2] sdf1[8] sdd1[1] sdn1[7] sde1[0]
>       11721071616 blocks super 1.2 level 6, 512k chunk, algorithm 2 [8/5] [UUU___UU]
> 
> After exchanging the controller, I verified that the raid superblocks of the devices are still intact, but the superblock state was inconsistent. The removed drives were marked "active" and had a lower event count, whereas the other drives were "clean" with higher event count. I reassembled the array with this command:
> mdadm --assemble --force /dev/md7 /dev/sd[befghijk]1
> 
> This  removed the faulty flags and reset the event counts. I switched the raid to --readonly immediately, and ran a filesystem check (which found a few non-critical errors, such as unused inodes, block bitmap differences and wrong free block counts). The detail/examine of the current state is below [2].
> 
> I have the following questions:
> 1. From the perspective of raid data integrity (parity), is it safe to continue operating the raid now and fix the file system errors and verify the actual data in the files?

Yes

> In particular, I have read at [1] that when skipping the initial sync, parity data on the disks will stay wrong even after it is rewritten. Does the same apply when doing assemble --force ?

That applies to RAID5, but not RAID6 (in the current implementation)

> 
> 2. I have been trying to run a "check" sync_action on the raid (in read-only mode), to find out if there are mismatches, but it does not start. The sync_action is "idle" immediately after the "echo checked > sync_action" and /proc/mdstat does not report any change. There is nothing in dmesg either.

'check' will not work in read-only mode.  This is arguably a shortcoming.

> 
> 3. What other steps can / should I take before continuing raid usage (read-write), especially repair on the file system level?

The file system and RAID can be repaired independently - just go ahead, all
looks good. (unless that 3.2.2 kernel is from Ubuntu - in that case you might
need to be careful... What is the full "uname -a"?).

NeilBrown

> 
> 
> Thank you,
> 
> Kurt
> 
> [1] https://raid.wiki.kernel.org/index.php/Initial_Array_Creation#raid5
> 
> [2] I am running a 3.2.2 kernel with mdadm 3.1.4.
> 
> The current state of the raid is displayed below:
> md7 : active (read-only) raid6 sdf1[0] sdj1[7] sdg1[8] sdk1[6] sdb1[5] sdi1[4] sdh1[2] sde1[1]
>       11721071616 blocks super 1.2 level 6, 512k chunk, algorithm 2 [8/8] [UUUUUUUU]
> 
> mdadm --detail /dev/md7 
> /dev/md7:
>         Version : 1.2
>   Creation Time : <redacted>
>      Raid Level : raid6
>      Array Size : 11721071616 (11178.09 GiB 12002.38 GB)
>   Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
>    Raid Devices : 8
>   Total Devices : 8
>     Persistence : Superblock is persistent
> 
>     Update Time : Mon Jun 11 19:18:33 2012
>           State : clean
>  Active Devices : 8
> Working Devices : 8
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>            Name : <redacted>
>            UUID : <redacted>
>          Events : 79713
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       81        0      active sync   /dev/sdf1
>        1       8       65        1      active sync   /dev/sde1
>        2       8      113        2      active sync   /dev/sdh1
>        4       8      129        3      active sync   /dev/sdi1
>        5       8       17        4      active sync   /dev/sdb1
>        6       8      161        5      active sync   /dev/sdk1
>        8       8       97        6      active sync   /dev/sdg1
>        7       8      145        7      active sync   /dev/sdj1
> 
> 
> 
> /dev/sdb1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : <redacted>
>            Name : <redacted>
>   Creation Time : <redacted>
>      Raid Level : raid6
>    Raid Devices : 8
> 
>  Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
>      Array Size : 23442143232 (11178.09 GiB 12002.38 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : <redacted>
> 
>     Update Time : Mon Jun 11 10:13:08 2012
>        Checksum : d207eb78 - correct
>          Events : 79712
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 4
>    Array State : AAAAAAAA ('A' == active, '.' == missing)
> 
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : <redacted>
>            Name : <redacted>
>   Creation Time : <redacted>
>      Raid Level : raid6
>    Raid Devices : 8
> 
>  Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
>      Array Size : 23442143232 (11178.09 GiB 12002.38 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : <redacted>
> 
>     Update Time : Mon Jun 11 19:18:33 2012
>        Checksum : cea4ea72 - correct
>          Events : 79713
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 1
>    Array State : AAA...AA ('A' == active, '.' == missing)
> 
> /dev/sdf1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : <redacted>
>            Name : <redacted>
>   Creation Time : <redacted>
>      Raid Level : raid6
>    Raid Devices : 8
> 
>  Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
>      Array Size : 23442143232 (11178.09 GiB 12002.38 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : <redacted>
> 
>     Update Time : Mon Jun 11 19:18:33 2012
>        Checksum : 73e3de3b - correct
>          Events : 79713
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 0
>    Array State : AAAAAAAA ('A' == active, '.' == missing)
> 
> /dev/sdg1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : <redacted>
>            Name : <redacted>
>   Creation Time : <redacted>
>      Raid Level : raid6
>    Raid Devices : 8
> 
>  Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
>      Array Size : 23442143232 (11178.09 GiB 12002.38 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : <redacted>
> 
>     Update Time : Mon Jun 11 19:18:33 2012
>        Checksum : b7ef499c - correct
>          Events : 79713
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 6
>    Array State : AAA...AA ('A' == active, '.' == missing)
> 
> /dev/sdh1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : <redacted>
>            Name : <redacted>
>   Creation Time : <redacted>
>      Raid Level : raid6
>    Raid Devices : 8
> 
>  Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
>      Array Size : 23442143232 (11178.09 GiB 12002.38 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : <redacted>
> 
>     Update Time : Mon Jun 11 19:18:33 2012
>        Checksum : c75d3da5 - correct
>          Events : 79713
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 2
>    Array State : AAA...AA ('A' == active, '.' == missing)
> 
> /dev/sdi1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : <redacted>
>            Name : <redacted>
>   Creation Time : <redacted>
>      Raid Level : raid6
>    Raid Devices : 8
> 
>  Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
>      Array Size : 23442143232 (11178.09 GiB 12002.38 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : <redacted>
> 
>     Update Time : Mon Jun 11 10:13:08 2012
>        Checksum : 1a292902 - correct
>          Events : 79712
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 3
>    Array State : AAAAAAAA ('A' == active, '.' == missing)
> 
> /dev/sdj1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : <redacted>
>            Name : <redacted>
>   Creation Time : <redacted>
>      Raid Level : raid6
>    Raid Devices : 8
> 
>  Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
>      Array Size : 23442143232 (11178.09 GiB 12002.38 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : <redacted>
> 
>     Update Time : Mon Jun 11 19:18:33 2012
>        Checksum : 6f7b11b7 - correct
>          Events : 79713
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 7
>    Array State : AAA...AA ('A' == active, '.' == missing)
> 
> /dev/sdk1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : <redacted>
>            Name : <redacted>
>   Creation Time : <redacted>
>      Raid Level : raid6
>    Raid Devices : 8
> 
>  Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
>      Array Size : 23442143232 (11178.09 GiB 12002.38 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : <redacted>
> 
>     Update Time : Mon Jun 11 10:13:08 2012
>        Checksum : a2773548 - correct
>          Events : 79712
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 5
>    Array State : AAAAAAAA ('A' == active, '.' == missing)
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-06-14 12:52 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-14 11:29 Check after raid6 failure Kurt Schmitt
2012-06-14 12:52 ` NeilBrown [this message]
2012-06-14 14:06   ` Kurt Schmitt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120614225239.0dee7594@notabene.brown \
    --to=neilb@suse.de \
    --cc=kurt_schmitt@gmx.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.