From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oliver Schinagl Subject: Re: Help, array corrupted after clean shutdown. Date: Sat, 06 Apr 2013 20:01:49 +0200 Message-ID: <5160630D.9000508@schinagl.nl> References: <5160060B.8020603@schinagl.nl> <51603BF2.404@schinagl.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Durval Menezes Cc: Linux RAID List-Id: linux-raid.ids On 04/06/13 19:44, Durval Menezes wrote: > Hi Oliver, > > Seems most of your problems are filesystem corruption (the extN family > is well known for lack of robustness). > > I would try to mount the filesystem read-only (without fsck) and copy > off as much data as possible... Then fsck and try to copy the rest. > > Good luck. It fails to mount ;) How can I ensure that the array is not corrupt however (while degraded)? At least that way, I can try my luck with ext4 tools. > > -- > Durval. > > On Apr 6, 2013 12:13 PM, "Oliver Schinagl" > wrote: > > On 04/06/13 17:06, Durval Menezes wrote: > > Oliver, > > What file system? LVM or direct on the MD device? > > Sorry, should have mentioned this. > > I have 4 1.5 TB sata drives, connected to the onboard sata controller. > > I have made 1 GPT partition ontop of each drive and then made a > raid5 array ontop of those devices: > > md101 : active (read-only) raid5 sdd1[0] sde1[4] sdf1[1] > 4395413760 blocks super 1.2 level 5, 256k chunk, algorithm 2 > [4/3] [UU_U] > > I then formatted /dev/md101 with ext4. > > Tune2fs still happily runs on /dev/md101, but of course that doesn't > mean anything. > > riley tmp # tune2fs -l /dev/md101 > tune2fs 1.42 (29-Nov-2011) > Filesystem volume name: data01 > Last mounted on: /tank/01 > Filesystem UUID: 9c812d61-96ce-4b71-9763-__b77e8b9618d1 > Filesystem magic number: 0xEF53 > Filesystem revision #: 1 (dynamic) > Filesystem features: has_journal ext_attr resize_inode > dir_index filetype extent flex_bg sparse_super large_file huge_file > uninit_bg dir_nlink extra_isize > Filesystem flags: signed_directory_hash > Default mount options: (none) > Filesystem state: not clean > Errors behavior: Continue > Filesystem OS type: Linux > Inode count: 274718720 > Block count: 1098853440 > Reserved block count: 0 > Free blocks: 228693396 > Free inodes: 274387775 > First block: 0 > Block size: 4096 > Fragment size: 4096 > Reserved GDT blocks: 762 > Blocks per group: 32768 > Fragments per group: 32768 > Inodes per group: 8192 > Inode blocks per group: 512 > RAID stride: 64 > RAID stripe width: 192 > Flex block group size: 16 > Filesystem created: Wed Apr 28 16:42:58 2010 > Last mount time: Tue May 4 17:14:48 2010 > Last write time: Sat Apr 6 11:45:57 2013 > Mount count: 10 > Maximum mount count: 32 > Last checked: Wed Apr 28 16:42:58 2010 > Check interval: 15552000 (6 months) > Next check after: Mon Oct 25 16:42:58 2010 > Lifetime writes: 3591 GB > Reserved blocks uid: 0 (user root) > Reserved blocks gid: 0 (group root) > First inode: 11 > Inode size: 256 > Required extra isize: 28 > Desired extra isize: 28 > Journal inode: 8 > First orphan inode: 17 > Default directory hash: half_md4 > Directory Hash Seed: f1248a94-5a6a-4e4a-af8a-__68b019d13ef6 > Journal backup: inode blocks > > > > -- > Durval. > > On Apr 6, 2013 8:23 AM, "Oliver Schinagl" > > >> wrote: > > Hi, > > I've had a powerfailure today, to which my UPS responded > nicely and > made my server shutdown normally. One would expect > everything is > well, right? The array, as far as I know, was operating without > problems before the shutdown, all 4 devices where normally > online. > mdadm sends me an e-mail if something is wrong, so does > smartctl. > > First thing I noticed that I had 2 (S) drives for > /dev/md101. I thus > started examining things. First I thought that it was some > mdadm > weirdness, where it failed to assemble the drive with all > components. > mdadm -A /dev/md101 /dev/sd[cdef]1 failed and gave the same > result. > Something was really wrong. > > I checked and compared the output of mdadm --examine on all > drives > (like -Evvvs below) and found that /dev/sdc1's events count > was wrong. > /dev/sdf1 and /dev/sdd1 matched (and later sde1 too, but > more on > that in a sec). So sdc1 may have been dropped from the > array without > me knowing it, unlikely put possible. The odd thing is the huge > difference in event counts, but all four are marked as ACTIVE. > > So then onto sde1; why was it failing on that. The gpt > table was > completly gone. 00000. Gone. I used hexdump to examine the > drive > further, and at 0x00041000 there was the mdraid table, as > one would > expect. Good, so it looks like only the gpt has been wiped > for some > misterious reason. Re-creating the gpt quickly revealed mdadm's > information was still correct (as can be seen below). > > So ignore sdc1 and assemble the drive as is should be fine? > Right? No. > mdadm -A /dev/md101 /dev/sd[def]1 worked without error. > > I always do a fsck before and after a reboot (unless of > course I > can't do the shutdown fsck) and verify /proc/mdadm after a > boot. So > before mounting, as always, I tried to run fsck /dev/md101 > -C -; but > that came up with tons of errors. I didn't fix anything and > aborted. > > And here we are now. I can't just copy the entire disk > (1.5TB per > disk) and 'experiment', I don't have 4 spare disks. The > first thing > I would want to try is is mdadm -A /dev/sd[cdf]1 --force > (leave out > the possibly corrupted sde1) and see what that does. > > > All that said when I did the assemble with the 'guessed' 3 > correct > drives. Did of course increase the events count. sdc1 of course > didn't partake in this. Assuming that it is in sync with > the rest, > what is the worst that can happen? And does the --read-only > flag > protect against it? > > > Linux riley 3.7.4-gentoo #2 SMP Tue Feb 5 16:20:59 CET 2013 > x86_64 > AMD Phenom(tm) II X4 905e Processor AuthenticAMD GNU/Linux > > riley tmp # mdadm --version > mdadm - v3.1.4 - 31st August 2010 > > > riley tmp # mdadm -Evvvvs > /dev/sdf1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 2becc012:2d317133:2447784c:____1aab300d > Name : riley:data01 (local to host riley) > Creation Time : Tue Apr 27 18:03:37 2010 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 2930276351 (1397.26 GiB 1500.30 GB) > Array Size : 8790827520 (4191.79 GiB 4500.90 GB) > Used Dev Size : 2930275840 (1397.26 GiB 1500.30 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 97877935:04c16c5f:0746cb98:____63bffb4c > > Update Time : Sat Apr 6 11:46:03 2013 > Checksum : b585717a - correct > Events : 512993 > > Layout : left-symmetric > Chunk Size : 256K > > Device Role : Active device 1 > Array State : AA.A ('A' == active, '.' == missing) > mdadm: No md superblock detected on /dev/sdf. > /dev/sde1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 2becc012:2d317133:2447784c:____1aab300d > Name : riley:data01 (local to host riley) > Creation Time : Tue Apr 27 18:03:37 2010 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 2930275847 (1397.26 GiB 1500.30 GB) > Array Size : 8790827520 (4191.79 GiB 4500.90 GB) > Used Dev Size : 2930275840 (1397.26 GiB 1500.30 GB) > Data Offset : 776 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 3f48d5a8:e3ee47a1:23c8b895:____addd3dd0 > > Update Time : Sat Apr 6 11:46:03 2013 > Checksum : eaec006b - correct > Events : 512993 > > Layout : left-symmetric > Chunk Size : 256K > > Device Role : Active device 3 > Array State : AA.A ('A' == active, '.' == missing) > mdadm: No md superblock detected on /dev/sde. > /dev/sdd1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 2becc012:2d317133:2447784c:____1aab300d > Name : riley:data01 (local to host riley) > Creation Time : Tue Apr 27 18:03:37 2010 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 2930276351 (1397.26 GiB 1500.30 GB) > Array Size : 8790827520 (4191.79 GiB 4500.90 GB) > Used Dev Size : 2930275840 (1397.26 GiB 1500.30 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 236f6c48:2a1bcf6b:a7d7d861:____53950637 > > Update Time : Sat Apr 6 11:46:03 2013 > Checksum : 87f31abb - correct > Events : 512993 > > Layout : left-symmetric > Chunk Size : 256K > > Device Role : Active device 0 > Array State : AA.A ('A' == active, '.' == missing) > mdadm: No md superblock detected on /dev/sdd. > /dev/sdc1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 2becc012:2d317133:2447784c:____1aab300d > Name : riley:data01 (local to host riley) > Creation Time : Tue Apr 27 18:03:37 2010 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 2930276351 (1397.26 GiB 1500.30 GB) > Array Size : 8790827520 (4191.79 GiB 4500.90 GB) > Used Dev Size : 2930275840 (1397.26 GiB 1500.30 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : active > Device UUID : 3ce8e262:ad864aee:9055af9b:____6cbfd47f > > Update Time : Sat Mar 16 20:20:47 2013 > Checksum : a7686a57 - correct > Events : 180132 > > Layout : left-symmetric > Chunk Size : 256K > > Device Role : Active device 2 > Array State : AAAA ('A' == active, '.' == missing) > mdadm: No md superblock detected on /dev/sdc. > > > Before I assembled the array for the first time (mdadm -A > /dev/md101 > /dev/sdd1 /dev/sde1 /dev/sdf1), this is how it looked like: > So identical to the above, wtih the exception of the number > of events. > > riley tmp # mdadm --examine /dev/sde1 > /dev/sde1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 2becc012:2d317133:2447784c:____1aab300d > Name : riley:data01 (local to host riley) > Creation Time : Tue Apr 27 18:03:37 2010 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 2930275847 (1397.26 GiB 1500.30 GB) > Array Size : 8790827520 (4191.79 GiB 4500.90 GB) > Used Dev Size : 2930275840 (1397.26 GiB 1500.30 GB) > Data Offset : 776 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 3f48d5a8:e3ee47a1:23c8b895:____addd3dd0 > > Update Time : Sat Apr 6 09:44:30 2013 > Checksum : eaebe3ea - correct > Events : 512989 > > Layout : left-symmetric > Chunk Size : 256K > > Device Role : Active device 3 > Array State : AA.A ('A' == active, '.' == missing) > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-raid" in > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at > http://vger.kernel.org/____majordomo-info.html > > > > >