From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 809DB7CA1 for ; Sun, 21 Feb 2016 19:30:21 -0600 (CST) Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by relay1.corp.sgi.com (Postfix) with ESMTP id 410168F80B7 for ; Sun, 21 Feb 2016 17:30:15 -0800 (PST) Received: from vern.gendns.com (vern.gendns.com [206.190.152.46]) by cuda.sgi.com with ESMTP id kB2AG8lqoIuk6KWg (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sun, 21 Feb 2016 17:29:53 -0800 (PST) Received: from 108-198-5-147.lightspeed.okcbok.sbcglobal.net ([108.198.5.147]:51926 helo=[192.168.0.113]) by vern.gendns.com with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.86) (envelope-from ) id 1aXfJu-002sNR-TG for xfs@oss.sgi.com; Sun, 21 Feb 2016 20:29:50 -0500 From: David Lechner Subject: recovering corrupt filesystem after raid failure Message-ID: <56CA6492.7000407@lechnology.com> Date: Sun, 21 Feb 2016 19:29:54 -0600 MIME-Version: 1.0 List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com Long story short, I had a dual disk failure in a raid 5. I've managed to get the raid back up and salvaged what I could. However, the xfs is seriously damaged. I've tried running xfs_repair, but it is failing and it recommended to send a message to this mailing list. This is an Ubuntu 12.04 machine, so xfs_repair version 3.1.7. The file system won't mount. Fails with "mount: Structure needs cleaning". So I tried xfs_repair. I had to resort to xfs_repair -L because the first 500MB or so of the filesystem was wiped out. Now, xfs_repair /dev/md127 gets stuck, so I am running xfs_repair -P /dev/md127. This gets much farther, but it is failing too. It gives an error message like this: ... disconnected inode 2101958, moving to lost+found corrupt dinode 2101958, extent total = 1, nblocks = 0. This is a bug. Please capture the filesystem metadata with xfs_metadump and report it to xfs@oss.sgi.com. cache_node_purge: refcount was 1, not zero (node=0x7f2c57e1b120) fatal error -- 117 - couldn't iget disconnected inode However, nblocks = 0 does not seem to be true... xfs_db -x /dev/md127 cache_node_purge: refcount was 1, not zero (node=0x219c9e0) xfs_db: cannot read root inode (117) cache_node_purge: refcount was 1, not zero (node=0x21a0620) xfs_db: cannot read realtime bitmap inode (117) xfs_db> inode 2101958 xfs_db> print core.magic = 0x494e core.mode = 0100664 core.version = 2 core.format = 2 (extents) core.nlinkv2 = 1 core.onlink = 0 core.projid_lo = 0 core.projid_hi = 0 core.uid = 119 core.gid = 133 core.flushiter = 5 core.atime.sec = Sun Apr 26 02:30:54 2015 core.atime.nsec = 000000000 core.mtime.sec = Fri Nov 7 14:54:27 2014 core.mtime.nsec = 000000000 core.ctime.sec = Sun Apr 26 02:30:54 2015 core.ctime.nsec = 941028318 core.size = 279864 core.nblocks = 69 core.extsize = 0 core.nextents = 1 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) core.dmevmask = 0 core.dmstate = 0 core.newrtbm = 0 core.prealloc = 0 core.realtime = 0 core.immutable = 0 core.append = 0 core.sync = 0 core.noatime = 0 core.nodump = 0 core.rtinherit = 0 core.projinherit = 0 core.nosymlinks = 0 core.extsz = 0 core.extszinherit = 0 core.nodefrag = 0 core.filestream = 0 core.gen = 3320313054 next_unlinked = null u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,147322885,69,0] If I re-run xfs_repair -P /dev/md127, it will fail on different seemingly random inode with the same error message. I've uploaded the output of xfs_metadump to dropbox if anyone would like to have a look. It is 22MB compressed, 2.2GB uncompressed. https://www.dropbox.com/s/o18cxapu7o75sor/xfs_metadump.xz?dl=0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs