From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Patrick H." Subject: Re: filesystem corruption Date: Tue, 04 Jan 2011 10:31:56 -0700 Message-ID: <4D23598C.3040901@feystorm.net> References: <4D212D4A.3040003@feystorm.net> <20110103141603.632fdf3e@notabene.brown> <4D214B5C.3010103@feystorm.net> <20110103155630.565341d0@notabene.brown> <4D215902.9010308@feystorm.net> <20110104163324.70baff54@notabene.brown> <4D22D14F.8070407@feystorm.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4D22D14F.8070407@feystorm.net> Sender: linux-raid-owner@vger.kernel.org To: linux-raid@vger.kernel.org List-Id: linux-raid.ids Sent: Tue Jan 04 2011 00:50:39 GMT-0700 (Mountain Standard Time) From: Patrick H. To: linux-raid@vger.kernel.org Subject: Re: filesystem corruption > Sent: Mon Jan 03 2011 22:33:24 GMT-0700 (Mountain Standard Time) > From: NeilBrown > To: Patrick H. linux-raid@vger.kernel.org > Subject: Re: filesystem corruption >> On Sun, 02 Jan 2011 22:05:06 -0700 "Patrick H." >> >> wrote: >> >> >>> Ok, thanks for the info. >>> I think I'll solve it by creating 2 dedicated hosts for running the >>> array, but not actually export any disks themselves. This way if a >>> master dies, all the raid disks are still there and can be picked up >>> by the other master. >>> >>> >> >> That sounds like it should work OK. >> >> NeilBrown >> > Well, it didnt solve it. if I power the entire cluster down and start > it back up, I get corruption, on old files that werent being modified > still. If I power off just a single node, it seems to handle it fine, > just not the whole cluster. > > It also seems to happen fairly frequently now. In the previous setup > it was probably 1 in 50 failures that there was corruption. Now its > pretty much a guarantee there will be corruption if I kill it. > On the last failure I did, when it came back up, it re-assembled the > entire raid-5 array with all disks active and none of them needing any > sort of re-sync. The disk controller is battery backed, so even if it > was re-ordering the writes, the battery should ensure that it all gets > committed. > > Any other ideas? > > -Patrick Here is some info from my most recent failure simulation. This one resulted in about 50 corrupt files, another 40 or so that cant even be opened, and one stale nfs file handle. I had the cluster script dump out a bunch of info before and after assembling the array. = = = = = = = = = = # mdadm -E /dev/etherd/e1.1p1 /dev/etherd/e1.1p1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c Name : dm01:126 (local to host dm01) Creation Time : Tue Jan 4 04:45:50 2011 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 2119520 (1035.10 MiB 1085.19 MB) Array Size : 4238848 (2.02 GiB 2.17 GB) Used Dev Size : 2119424 (1035.05 MiB 1085.15 MB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : a20adb76:af00f276:5be79a36:b4ff3a8b Internal Bitmap : 2 sectors from superblock Update Time : Tue Jan 4 16:45:56 2011 Checksum : 361041f6 - correct Events : 486 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 0 Array State : AAA ('A' == active, '.' == missing) # mdadm -X /dev/etherd/e1.1p1 Filename : /dev/etherd/e1.1p1 Magic : 6d746962 Version : 4 UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c Events : 486 Events Cleared : 486 State : OK Chunksize : 64 KB Daemon : 5s flush period Write Mode : Normal Sync Size : 1059712 (1035.05 MiB 1085.15 MB) Bitmap : 16558 bits (chunks), 189 dirty (1.1%) = = = = = = = = = = = = = = = = = = = = # mdadm -E /dev/etherd/e2.1p1 /dev/etherd/e2.1p1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c Name : dm01:126 (local to host dm01) Creation Time : Tue Jan 4 04:45:50 2011 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 2119520 (1035.10 MiB 1085.19 MB) Array Size : 4238848 (2.02 GiB 2.17 GB) Used Dev Size : 2119424 (1035.05 MiB 1085.15 MB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : f9205ace:0796ecf5:2cca363c:c2873816 Internal Bitmap : 2 sectors from superblock Update Time : Tue Jan 4 16:45:56 2011 Checksum : 9d235885 - correct Events : 486 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 1 Array State : AAA ('A' == active, '.' == missing) # mdadm -X /dev/etherd/e2.1p1 Filename : /dev/etherd/e2.1p1 Magic : 6d746962 Version : 4 UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c Events : 486 Events Cleared : 486 State : OK Chunksize : 64 KB Daemon : 5s flush period Write Mode : Normal Sync Size : 1059712 (1035.05 MiB 1085.15 MB) Bitmap : 16558 bits (chunks), 189 dirty (1.1%) = = = = = = = = = = = = = = = = = = = = # mdadm -E /dev/etherd/e3.1p1 /dev/etherd/e3.1p1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c Name : dm01:126 (local to host dm01) Creation Time : Tue Jan 4 04:45:50 2011 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 2119520 (1035.10 MiB 1085.19 MB) Array Size : 4238848 (2.02 GiB 2.17 GB) Used Dev Size : 2119424 (1035.05 MiB 1085.15 MB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 7f90958d:22de5c08:88750ecb:5f376058 Internal Bitmap : 2 sectors from superblock Update Time : Tue Jan 4 16:46:13 2011 Checksum : 3fce6b33 - correct Events : 487 Layout : left-symmetric Chunk Size : 64K Device Role : Active device 2 Array State : AAA ('A' == active, '.' == missing) # mdadm -X /dev/etherd/e3.1p1 Filename : /dev/etherd/e3.1p1 Magic : 6d746962 Version : 4 UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c Events : 487 Events Cleared : 486 State : OK Chunksize : 64 KB Daemon : 5s flush period Write Mode : Normal Sync Size : 1059712 (1035.05 MiB 1085.15 MB) Bitmap : 16558 bits (chunks), 249 dirty (1.5%) = = = = = = = = = = - - - - - - - - - - - # mdadm -D /dev/md/fs01 /dev/md/fs01: Version : 1.2 Creation Time : Tue Jan 4 04:45:50 2011 Raid Level : raid5 Array Size : 2119424 (2.02 GiB 2.17 GB) Used Dev Size : 1059712 (1035.05 MiB 1085.15 MB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Jan 4 16:46:13 2011 State : active, resyncing Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Rebuild Status : 1% complete Name : dm01:126 (local to host dm01) UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c Events : 486 Number Major Minor RaidDevice State 0 152 273 0 active sync /dev/block/152:273 1 152 529 1 active sync /dev/block/152:529 3 152 785 2 active sync /dev/block/152:785 - - - - - - - - - - - The old method *never* resulted in this much corruption, and never generated stale nfs file handles. Why is this so much worse now when it was supposed to be better?