From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id CD2BE7FD5 for ; Fri, 1 Mar 2013 14:53:08 -0600 (CST) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay1.corp.sgi.com (Postfix) with ESMTP id 9A1B68F8035 for ; Fri, 1 Mar 2013 12:53:08 -0800 (PST) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by cuda.sgi.com with ESMTP id CTvy1TkoLLWT3RBc for ; Fri, 01 Mar 2013 12:53:07 -0800 (PST) Date: Sat, 2 Mar 2013 07:53:05 +1100 From: Dave Chinner Subject: Re: xfs_repair segfaults Message-ID: <20130301205305.GD23616@dastard> References: <20130301111701.GB23616@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Ole Tange Cc: xfs@oss.sgi.com On Fri, Mar 01, 2013 at 01:24:36PM +0100, Ole Tange wrote: > On Fri, Mar 1, 2013 at 12:17 PM, Dave Chinner wrote: > > On Thu, Feb 28, 2013 at 04:22:08PM +0100, Ole Tange wrote: > : > >> I forced a RAID online. I have done that before and xfs_repair > >> normally removes the last hour of data or so, but saves everything > >> else. > > > > Why did you need to force it online? > > More than 2 harddisks went offline. We have seen that before and it is > not due to bad harddisks. It may be due to driver/timings/controller. I thought that might be the case. What filesystem errors occurred when the srives went offline? > >> /usr/local/src/xfsprogs-3.1.10/repair# ./xfs_repair -n /dev/md5p1 > >> Phase 1 - find and verify superblock... > >> Phase 2 - using internal log > >> - scan filesystem freespace and inode maps... > >> flfirst 232 in agf 91 too large (max = 128) > > > > Can you run: > > > > # xfs_db -c "agf 91" -c p /dev/md5p1 > > > > And post the output? > > # xfs_db -c "agf 91" -c p /dev/md5p1 > xfs_db: cannot init perag data (117) Interesting. It's detecting corrupt AG headers. > magicnum = 0x58414746 > versionnum = 1 > seqno = 91 > length = 268435200 > bnoroot = 295199 > cntroot = 13451007 > bnolevel = 2 > cntlevel = 2 > flfirst = 232 > fllast = 32 > flcount = 191 That implies that the free list is actually 232+191-32 = 391 entries long. That doesn't add up any way I look at it. both the flfirst and flcount fields look wrong here, which rules out a simple bit error as the problem. I can't see how these values would have been written by XFS as they are out of range for 512 byte sector AGFL: be32_add_cpu(&agf->agf_flfirst, 1); xfs_trans_brelse(tp, agflbp); if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp)) agf->agf_flfirst = 0; So I suspect that something more than just disks going off line here went wrong here, as I've never seen this sort of corruption before... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs