From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (Postfix) with ESMTP id 933DE7CBE for ; Sat, 24 Aug 2013 18:43:40 -0500 (CDT) Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by relay3.corp.sgi.com (Postfix) with ESMTP id 12C8AAC003 for ; Sat, 24 Aug 2013 16:43:36 -0700 (PDT) Received: from benjamin.baylink.com (rrcs-24-129-180-187.se.biz.rr.com [24.129.180.187]) by cuda.sgi.com with ESMTP id UJslWkCFWLVR0hYj for ; Sat, 24 Aug 2013 16:43:35 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by benjamin.baylink.com (Postfix) with ESMTP id D7AE71F0029C for ; Sat, 24 Aug 2013 19:43:26 -0400 (EDT) Received: from benjamin.baylink.com ([127.0.0.1]) by localhost (benjamin.baylink.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HaYsJ7ZGKz-W for ; Sat, 24 Aug 2013 19:43:23 -0400 (EDT) Received: from benjamin.baylink.com (benjamin.baylink.com [192.168.253.10]) by benjamin.baylink.com (Postfix) with ESMTP id 0500F1F002E9 for ; Sat, 24 Aug 2013 19:43:23 -0400 (EDT) Date: Sat, 24 Aug 2013 19:43:23 -0400 (EDT) From: Jay Ashworth Message-ID: <20493414.4932.1377387802955.JavaMail.root@benjamin.baylink.com> In-Reply-To: <5211BF74.9060605@hardwarefreak.com> Subject: Re: XFS recovery resumes... MIME-Version: 1.0 List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com ----- Original Message ----- > From: "Stan Hoeppner" > Joe appears to have hit the nail on the head WRT this being a hardware > problem. This error confirms it. It would appear that when the Antec > PSU went South it damaged a motherboard device, possibly a VRM, probably > a cap or two, or more. Maybe damaged a DRAM cell or few that work fine > with memtest86+ but not with the access pattern generated by your XFS > workload. Well, it appears you may be right. I'd got all the data off that 3T with no read failures, and then remade the filesystem. I had to use -f because it saw the old one, but I don't know if that's pertinent here or not. Anyroad, I made the new filesystem, with whatever mkfs.xfs's defaults are for a 3T filesystem in 3.1.11, and then started rsyncing the 2TB drive onto it, so I could fix that one. Got 88GB in, and did the same thing: =========================================== Aug 22 13:34:13 duckling kernel: [67215.008867] XFS (sda1): Corruption detected. Unmount and run xfs_repair Aug 22 13:34:13 duckling kernel: [67215.008899] XFS (sda1): Internal error xfs_trans_cancel at line 1467 of file /home/abuild/rpmbuild/BUILD/kernel-default-3.4.47/linux-3.4/fs/xfs/xfs_trans.c. Caller 0xe3d9349d Aug 22 13:34:13 duckling kernel: [67215.008903] Aug 22 13:34:13 duckling kernel: [67215.008910] Pid: 4122, comm: rsync Not tainted 3.4.47-2.38-default #1 Aug 22 13:34:13 duckling kernel: [67215.008914] Call Trace: Aug 22 13:34:13 duckling kernel: [67215.008946] [] try_stack_unwind+0x199/0x1b0 Aug 22 13:34:13 duckling kernel: [67215.008959] [] dump_trace+0x47/0xf0 Aug 22 13:34:13 duckling kernel: [67215.008968] [] show_trace_log_lvl+0x4b/0x60 Aug 22 13:34:13 duckling kernel: [67215.008975] [] show_trace+0x18/0x20 Aug 22 13:34:13 duckling kernel: [67215.008986] [] dump_stack+0x6d/0x72 Aug 22 13:34:13 duckling kernel: [67215.009137] [] xfs_trans_cancel+0xe7/0x110 [xfs] Aug 22 13:34:13 duckling kernel: [67215.009426] [] xfs_create+0x22d/0x570 [xfs] Aug 22 13:34:13 duckling kernel: [67215.009551] [] xfs_vn_mknod+0x8a/0x170 [xfs] Aug 22 13:34:13 duckling kernel: [67215.009624] [] vfs_create+0xa3/0x130 Aug 22 13:34:13 duckling kernel: [67215.009634] [] do_last+0x6b5/0x7e0 Aug 22 13:34:13 duckling kernel: [67215.009644] [] path_openat+0xaa/0x360 Aug 22 13:34:13 duckling kernel: [67215.009652] [] do_filp_open+0x2e/0x80 Aug 22 13:34:13 duckling kernel: [67215.009664] [] do_sys_open+0xee/0x1d0 Aug 22 13:34:13 duckling kernel: [67215.009673] [] sys_open+0x30/0x40 Aug 22 13:34:13 duckling kernel: [67215.009687] [] sysenter_do_call+0x12/0x28 Aug 22 13:34:13 duckling kernel: [67215.009719] [] 0xb76bb42f Aug 22 13:34:13 duckling kernel: [67215.009726] XFS (sda1): xfs_do_force_shutdown(0x8) called from line 1468 of file /home/abuild/rpmbuild/BUILD/kernel-default-3.4.47/linux-3.4/fs/xfs/xfs_trans.c. Return address = 0xe3dd2d5f Aug 22 13:34:13 duckling kernel: [67215.034952] XFS (sda1): Corruption of in-memory data detected. Shutting down filesystem Aug 22 13:34:13 duckling kernel: [67215.034966] XFS (sda1): Please umount the filesystem and rectify the problem(s) =========================================== Followed by the obligatory: Aug 22 13:35:37 duckling kernel: [67299.040080] XFS (sda1): xfs_log_force: error 5 returned. a lot. > I'd first try manually clocking the DIMMs down a bit, from 400 to 333, > or 333 to 266, whichever is called for. IIRC that VIA Northbrige has > decoupled CPU and DRAM buses so you should be able to clock the DRAM > down without affecting CPU frequency. If the problem persists, swap the > DIMMs if you have some on hand or can get them really cheap like $10 > for a pair. I'll try swapping it; this mobo has always gotten whacky if we went over 512M, which is why we haven't. I don't know if I can manually reclock the ram, though I might can turn the waitstates up. > If that doesn't fix it, this may be a viable inexpensive > solution: > > http://www.newegg.com/Product/Product.aspx?Item=N82E16813186215 > http://www.newegg.com/Product/Product.aspx?Item=N82E16819103888 > http://www.newegg.com/Product/Product.aspx?Item=N82E16820145252 > > $109 to replace your central electronics complex. This is the least > expensive quality set of parts with good feature set I could come up > with at Newegg, to take the sting out of dropping cash on a forced > upgrade. $15 more for the Foxconn AM3 board w/HDMI if you have a newer > TV or AV receiver. Well, I can live without HDMI, but my present MS-7021 mobo has 5 PCI slots, and I'm using all of them: 2 PVR-150s, a PVR-500, and a SiI 4-port raid (which will talk to 2 and 3TB drives; the motherboard SATA won't even see them). I forget what's in 5, but I think it was the only VGA card I had with S-Video out. So, while that's a damn nice price point, it will require me to buy a bunch of Ethernet tuners as well. I'll try the RAM. It's really odd, though, that the badblocks workload and both memtests couldn't find a problem, if it is the memory plane... Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA #natog +1 727 647 1274 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs