* recovering corrupt filesystem after raid failure
@ 2016-02-22 1:29 David Lechner
2016-02-22 2:24 ` Dave Chinner
0 siblings, 1 reply; 3+ messages in thread
From: David Lechner @ 2016-02-22 1:29 UTC (permalink / raw)
To: xfs
Long story short, I had a dual disk failure in a raid 5. I've managed to
get the raid back up and salvaged what I could. However, the xfs is
seriously damaged. I've tried running xfs_repair, but it is failing and
it recommended to send a message to this mailing list. This is an Ubuntu
12.04 machine, so xfs_repair version 3.1.7.
The file system won't mount. Fails with "mount: Structure needs
cleaning". So I tried xfs_repair. I had to resort to xfs_repair -L
because the first 500MB or so of the filesystem was wiped out. Now,
xfs_repair /dev/md127 gets stuck, so I am running xfs_repair -P
/dev/md127. This gets much farther, but it is failing too. It gives an
error message like this:
...
disconnected inode 2101958, moving to lost+found
corrupt dinode 2101958, extent total = 1, nblocks = 0. This is a bug.
Please capture the filesystem metadata with xfs_metadump and
report it to xfs@oss.sgi.com.
cache_node_purge: refcount was 1, not zero (node=0x7f2c57e1b120)
fatal error -- 117 - couldn't iget disconnected inode
However, nblocks = 0 does not seem to be true...
xfs_db -x /dev/md127
cache_node_purge: refcount was 1, not zero (node=0x219c9e0)
xfs_db: cannot read root inode (117)
cache_node_purge: refcount was 1, not zero (node=0x21a0620)
xfs_db: cannot read realtime bitmap inode (117)
xfs_db> inode 2101958
xfs_db> print
core.magic = 0x494e
core.mode = 0100664
core.version = 2
core.format = 2 (extents)
core.nlinkv2 = 1
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 119
core.gid = 133
core.flushiter = 5
core.atime.sec = Sun Apr 26 02:30:54 2015
core.atime.nsec = 000000000
core.mtime.sec = Fri Nov 7 14:54:27 2014
core.mtime.nsec = 000000000
core.ctime.sec = Sun Apr 26 02:30:54 2015
core.ctime.nsec = 941028318
core.size = 279864
core.nblocks = 69
core.extsize = 0
core.nextents = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 3320313054
next_unlinked = null
u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,147322885,69,0]
If I re-run xfs_repair -P /dev/md127, it will fail on different
seemingly random inode with the same error message.
I've uploaded the output of xfs_metadump to dropbox if anyone would like
to have a look. It is 22MB compressed, 2.2GB uncompressed.
https://www.dropbox.com/s/o18cxapu7o75sor/xfs_metadump.xz?dl=0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: recovering corrupt filesystem after raid failure
2016-02-22 1:29 recovering corrupt filesystem after raid failure David Lechner
@ 2016-02-22 2:24 ` Dave Chinner
2016-02-22 17:53 ` David Lechner
0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2016-02-22 2:24 UTC (permalink / raw)
To: David Lechner; +Cc: xfs
On Sun, Feb 21, 2016 at 07:29:54PM -0600, David Lechner wrote:
> Long story short, I had a dual disk failure in a raid 5. I've managed to
> get the raid back up and salvaged what I could. However, the xfs is
> seriously damaged. I've tried running xfs_repair, but it is failing and
> it recommended to send a message to this mailing list. This is an Ubuntu
> 12.04 machine, so xfs_repair version 3.1.7.
So the first thing to do is get a more recent xfsprogs package and
try that. There's not a lot of point in us looking at problems with
a 4 and half year old package that we've probably already fixed.
> The file system won't mount. Fails with "mount: Structure needs
> cleaning". So I tried xfs_repair. I had to resort to xfs_repair -L
> because the first 500MB or so of the filesystem was wiped out.
Oh, so even if you can repair the filesystem, your data is likely to
be irretreivably corrupted.
> Now,
> xfs_repair /dev/md127 gets stuck, so I am running xfs_repair -P
> /dev/md127. This gets much farther, but it is failing too. It gives an
> error message like this:
>
>
> ...
> disconnected inode 2101958, moving to lost+found
> corrupt dinode 2101958, extent total = 1, nblocks = 0. This is a bug.
> Please capture the filesystem metadata with xfs_metadump and
> report it to xfs@oss.sgi.com.
> cache_node_purge: refcount was 1, not zero (node=0x7f2c57e1b120)
>
> fatal error -- 117 - couldn't iget disconnected inode
>
>
>
> However, nblocks = 0 does not seem to be true...
Probably because it got cleared in memory before this problem was
tripped over.
> If I re-run xfs_repair -P /dev/md127, it will fail on different
> seemingly random inode with the same error message.
Yup, you definitely need to run a current xfs_repair on this
filesystem before going any further.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: recovering corrupt filesystem after raid failure
2016-02-22 2:24 ` Dave Chinner
@ 2016-02-22 17:53 ` David Lechner
0 siblings, 0 replies; 3+ messages in thread
From: David Lechner @ 2016-02-22 17:53 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 02/21/2016 08:24 PM, Dave Chinner wrote:
> On Sun, Feb 21, 2016 at 07:29:54PM -0600, David Lechner wrote:
>> Long story short, I had a dual disk failure in a raid 5. I've managed to
>> get the raid back up and salvaged what I could. However, the xfs is
>> seriously damaged. I've tried running xfs_repair, but it is failing and
>> it recommended to send a message to this mailing list. This is an Ubuntu
>> 12.04 machine, so xfs_repair version 3.1.7.
>
> So the first thing to do is get a more recent xfsprogs package and
> try that. There's not a lot of point in us looking at problems with
> a 4 and half year old package that we've probably already fixed.
>
>> The file system won't mount. Fails with "mount: Structure needs
>> cleaning". So I tried xfs_repair. I had to resort to xfs_repair -L
>> because the first 500MB or so of the filesystem was wiped out.
>
> Oh, so even if you can repair the filesystem, your data is likely to
> be irretreivably corrupted.
>
>> Now,
>> xfs_repair /dev/md127 gets stuck, so I am running xfs_repair -P
>> /dev/md127. This gets much farther, but it is failing too. It gives an
>> error message like this:
>>
>>
>> ...
>> disconnected inode 2101958, moving to lost+found
>> corrupt dinode 2101958, extent total = 1, nblocks = 0. This is a bug.
>> Please capture the filesystem metadata with xfs_metadump and
>> report it to xfs@oss.sgi.com.
>> cache_node_purge: refcount was 1, not zero (node=0x7f2c57e1b120)
>>
>> fatal error -- 117 - couldn't iget disconnected inode
>>
>>
>>
>> However, nblocks = 0 does not seem to be true...
>
> Probably because it got cleared in memory before this problem was
> tripped over.
>
>> If I re-run xfs_repair -P /dev/md127, it will fail on different
>> seemingly random inode with the same error message.
>
> Yup, you definitely need to run a current xfs_repair on this
> filesystem before going any further.
>
> Cheers,
>
> Dave.
>
Thanks for the advice. The newer version was able to complete
successfully. I can now mount the file system and I ended up with 1.5TB
in lost+found, so at least there is still something there.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-02-22 17:53 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-22 1:29 recovering corrupt filesystem after raid failure David Lechner
2016-02-22 2:24 ` Dave Chinner
2016-02-22 17:53 ` David Lechner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox