* Problem recovering XFS filesystem
@ 2012-04-26 20:00 Aaron Williams
2012-04-27 21:31 ` Michael Monnerie
0 siblings, 1 reply; 5+ messages in thread
From: Aaron Williams @ 2012-04-26 20:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 4111 bytes --]
Hi,
I had an issue with my RAID system and am having problems trying to recover
my XFS filesystem.
First of all, I made a copy of it to another device (using dd) and I was
able to recover that image with some data loss by blowing away the log. I
would like to try and recover it properly, however.
I currently have extracted all of the files from the recovered version and
now am trying to recover again without blowing away the log.
When I attempt to mount the filesystem I get the error: mount: Structure
needs cleaning
The kernel reports:
Apr 26 12:53:41 flash kernel: [388563.491665] XFS (sdd1): Mounting
Filesystem
Apr 26 12:53:41 flash kernel: [388563.503667] XFS (sdd1): Starting recovery
(logdev: internal)
Apr 26 12:53:41 flash kernel: [388563.509539] XFS: Internal error
XFS_WANT_CORRUPTED_GOTO at line 1530 of file
/home/abuild/rpmbuild/BUILD/kernel-default-3.1.10/linux-3.1/fs/xfs/xfs_alloc.c.
Caller 0xffffffffa005da7c
Apr 26 12:53:41 flash kernel: [388563.509540]
Apr 26 12:53:41 flash kernel: [388563.509542] Pid: 29146, comm: mount
Tainted: P 3.1.10-22-default #1
Apr 26 12:53:41 flash kernel: [388563.509544] Call Trace:
Apr 26 12:53:41 flash kernel: [388563.509554] [<ffffffff810042fa>]
dump_trace+0x9a/0x270
Apr 26 12:53:41 flash kernel: [388563.509558] [<ffffffff815266c3>]
dump_stack+0x69/0x6f
Apr 26 12:53:41 flash kernel: [388563.509589] [<ffffffffa005b304>]
xfs_free_ag_extent+0x564/0x7c0 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509629] [<ffffffffa005da7c>]
xfs_free_extent+0xec/0x130 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509670] [<ffffffffa008b900>]
xlog_recover_process_efi+0x160/0x1b0 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509733] [<ffffffffa008cbf1>]
xlog_recover_process_efis.isra.8+0x61/0xb0 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509795] [<ffffffffa00907f0>]
xlog_recover_finish+0x20/0xb0 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509859] [<ffffffffa009337e>]
xfs_mountfs+0x43e/0x6b0 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509923] [<ffffffffa00536cd>]
xfs_fs_fill_super+0x1bd/0x270 [xfs]
Apr 26 12:53:41 flash kernel: [388563.509948] [<ffffffff8114e6a4>]
mount_bdev+0x1b4/0x1f0
Apr 26 12:53:41 flash kernel: [388563.509951] [<ffffffff8114ef55>]
mount_fs+0x45/0x1d0
Apr 26 12:53:41 flash kernel: [388563.509955] [<ffffffff81167656>]
vfs_kern_mount+0x66/0xd0
Apr 26 12:53:41 flash kernel: [388563.509958] [<ffffffff81168a33>]
do_kern_mount+0x53/0x120
Apr 26 12:53:41 flash kernel: [388563.509961] [<ffffffff8116a4e5>]
do_mount+0x1a5/0x260
Apr 26 12:53:41 flash kernel: [388563.509964] [<ffffffff8116a98a>]
sys_mount+0x9a/0xf0
Apr 26 12:53:41 flash kernel: [388563.509968] [<ffffffff81546712>]
system_call_fastpath+0x16/0x1b
Apr 26 12:53:41 flash kernel: [388563.509972] [<00007f8e22dd397a>]
0x7f8e22dd3979
Apr 26 12:53:41 flash kernel: [388563.509977] XFS (sdd1): Failed to recover
EFIs
Apr 26 12:53:41 flash kernel: [388563.509979] XFS (sdd1): log mount finish
failed
If I run xfs_repair I get the following:
./xfs_repair -v /dev/sdd1
Phase 1 - find and verify superblock...
- block cache size set to 2282936 entries
Phase 2 - using internal log
- zero log...
zero_log: head block 6784 tail block 6528
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
I am running the Linux kernel 3.1.10-22 (openSUSE) and xfsprogs 3.1.8.
When I did the repair I had to blow away the log and I had to use xfs_db to
fix some cases where blocks were claimed by multiple files. There was a
brief period where the corruption was occurring and the files were
generally things that are not important. I used xfs_db to identify the
files and deleted the files. After several passes using xfs_repair, xfs_db
and deleting the files I was able to recover the filesystem.
-Aaron
[-- Attachment #1.2: Type: text/html, Size: 4419 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Problem recovering XFS filesystem 2012-04-26 20:00 Problem recovering XFS filesystem Aaron Williams @ 2012-04-27 21:31 ` Michael Monnerie 2012-04-28 2:04 ` Aaron Williams 0 siblings, 1 reply; 5+ messages in thread From: Michael Monnerie @ 2012-04-27 21:31 UTC (permalink / raw) To: xfs; +Cc: Aaron Williams [-- Attachment #1.1: Type: text/plain, Size: 449 bytes --] Am Donnerstag, 26. April 2012, 13:00:06 schrieb Aaron Williams: > I was able to recover the filesystem. So your RAID busted the filesystem. Maybe the devs could want an xfs_metadump of the FS before your repair, so they can inspect it and improve xfs_repair. -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services: Protéger http://proteger.at [gesprochen: Prot-e-schee] Tel: +43 660 / 415 6531 [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem recovering XFS filesystem 2012-04-27 21:31 ` Michael Monnerie @ 2012-04-28 2:04 ` Aaron Williams 2012-04-29 0:35 ` Dave Chinner 0 siblings, 1 reply; 5+ messages in thread From: Aaron Williams @ 2012-04-28 2:04 UTC (permalink / raw) To: Michael Monnerie; +Cc: xfs [-- Attachment #1.1: Type: text/plain, Size: 2684 bytes --] On Fri, Apr 27, 2012 at 2:31 PM, Michael Monnerie < michael.monnerie@is.it-management.at> wrote: > Am Donnerstag, 26. April 2012, 13:00:06 schrieb Aaron Williams: > > I was able to recover the filesystem. > > So your RAID busted the filesystem. Maybe the devs could want an > xfs_metadump of the FS before your repair, so they can inspect it and > improve xfs_repair. > > Hi Michael, It appears that way, or it may be the fact that I mounted with nobarrier and in the process of recovering the RAID the information in the battery-backed RAID cache got blown away. I have an Areca ARC-1210 controller that was in the process of rebuilding when I attempted to shut down and reboot my Linux system after I mistakenly unplugged the wrong drive from my RAID array. I had another drive fail on me and it had completed rebuilding itself using a hot spare drive. I intended to remove the bad drive to replace it but disconnected the wrong drive. After reconnecting the good drive it went on to start rebuilding itself again. At this point I decided it might be safer to shut down Linux to replace the drive and thought the RAID controller would pick up where it left off in rebuilding. Linux did not shut down all the way however. I don't know if it was waiting for the array to rebuild itself or if something else happened. Anyway, I eventually hit the reset button. The RAID BIOS reported it could not find the array and I had to go about rebuilding the array. I also did a volume check which found about 70,000 blocks that it repaired. Needless to say I was quite nervous. Once that was done Linux refused to mount the XFS partition, I think due to corruption in the log. I have an image of my pre-repaired filesystem by using dd and can try and do a meta dump. The filesystem is 1.9TB in size with about 1.2TB of data in use. It looks like I was able to recover everything fine after blowing away the log. I see a bunch of files recovered in lost+found but those all appear to be files like cached web pages, etc. I also dumped the log to a file (128M). So far it looks like any actual data loss is minimal (thankfully) and was a good wakeup call to start doing more frequent backups. I also upgraded xfsprogs from 3.1.6-2.1.2 to 3.1.8 which did a much better job at recovery than my previous attempt. It would be nice if xfs_db would allow me to continue when the log is dirty instead of requiring me to mount the filesystem first. It also would be nice if xfs_logprint could try and identify the filenames of the inodes involved. I understand that there are plans to update XFS to include the UID in all of the on-disk structures. Any idea on when this will happen? -Aaron [-- Attachment #1.2: Type: text/html, Size: 3359 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem recovering XFS filesystem 2012-04-28 2:04 ` Aaron Williams @ 2012-04-29 0:35 ` Dave Chinner 2012-04-29 21:55 ` Aaron Williams 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2012-04-29 0:35 UTC (permalink / raw) To: Aaron Williams; +Cc: Michael Monnerie, xfs On Fri, Apr 27, 2012 at 07:04:48PM -0700, Aaron Williams wrote: > On Fri, Apr 27, 2012 at 2:31 PM, Michael Monnerie < > michael.monnerie@is.it-management.at> wrote: > > > Am Donnerstag, 26. April 2012, 13:00:06 schrieb Aaron Williams: > > > I was able to recover the filesystem. > > > > So your RAID busted the filesystem. Maybe the devs could want an > > xfs_metadump of the FS before your repair, so they can inspect it and > > improve xfs_repair. > > > > Hi Michael, <snip story of woe> > Once that was done Linux refused to mount the XFS partition, I think due to > corruption in the log. The reason will be in the log. e.g dmesg |tail -100 usually tells you why it failed to mount. > I have an image of my pre-repaired filesystem by using dd and can try and > do a meta dump. The filesystem is 1.9TB in size with about 1.2TB of data in > use. ISTR that metadump needs the log to be clean first, too. > It looks like I was able to recover everything fine after blowing away the > log. I see a bunch of files recovered in lost+found but those all appear to > be files like cached web pages, etc. > > I also dumped the log to a file (128M). > > So far it looks like any actual data loss is minimal (thankfully) and was a > good wakeup call to start doing more frequent backups. > > I also upgraded xfsprogs from 3.1.6-2.1.2 to 3.1.8 which did a much better > job at recovery than my previous attempt. That's good to know ;) > It would be nice if xfs_db would allow me to continue when the log is dirty > instead of requiring me to mount the filesystem first. Log recovery is done by the kernel code, not userspace, which is why there is this requirement. If the kernel can't replay it, then you have to use xfs_repair to zero it. Unforutnately, you can't just zero the log with xfs_repair - you could do it hackily by terminatin xfs_reapir just after it has zeroed the log.... > It also would be > nice if xfs_logprint could try and identify the filenames of the inodes > involved. xfs_logprint just analyses the log transactions - it knows nothing about the structure of the filesystem and doesn't even mount it. If you want to know the names of the inodes, then use xfs_db once you have the inode numbers in question. That requires a full filesystem traversal to find the name for the inode number in question, so can be *very* slow. Given that there can be hundreds of thousands of unique inodes in the log, that sort of translation woul dbe *extremely* expensive. > I understand that there are plans to update XFS to include the UID UUID, not UID. > in all of the on-disk structures. Any idea on when this will > happen? When it is ready. And then you'll have to mkfs a new filesystem to use it because it can't be retro-fitted to existing filesystems.... I'm already pushing infrastructure changes needed to support all the new on-disk functionality into the kernel, so the timeframe is months for experimental support on the new on-disk format.... Cheers, Dave. > > -Aaron > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem recovering XFS filesystem 2012-04-29 0:35 ` Dave Chinner @ 2012-04-29 21:55 ` Aaron Williams 0 siblings, 0 replies; 5+ messages in thread From: Aaron Williams @ 2012-04-29 21:55 UTC (permalink / raw) To: Dave Chinner; +Cc: Michael Monnerie, xfs Hi Dave, On 04/28/2012 05:35 PM, Dave Chinner wrote: > On Fri, Apr 27, 2012 at 07:04:48PM -0700, Aaron Williams wrote: >> On Fri, Apr 27, 2012 at 2:31 PM, Michael Monnerie < >> michael.monnerie@is.it-management.at> wrote: >> >>> Am Donnerstag, 26. April 2012, 13:00:06 schrieb Aaron Williams: >>>> I was able to recover the filesystem. >>> So your RAID busted the filesystem. Maybe the devs could want an >>> xfs_metadump of the FS before your repair, so they can inspect it and >>> improve xfs_repair. >>> >>> Hi Michael, > <snip story of woe> > >> Once that was done Linux refused to mount the XFS partition, I think due to >> corruption in the log. > The reason will be in the log. e.g dmesg |tail -100 usually tells > you why it failed to mount. I should have included the dmesg output earlier. Here it is: Apr 26 12:41:00 flash kernel: [387803.170457] XFS (sdd1): Mounting Filesystem Apr 26 12:41:00 flash kernel: [387803.181638] XFS (sdd1): Starting recovery (logdev: internal) Apr 26 12:41:00 flash kernel: [387803.453411] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1530 of file /home/abuild/rpmbuild/BUILD/kernel-default-3.1.10/linux-3.1/fs/xfs/xfs_alloc.c. Caller 0xffffffffa005da7c Apr 26 12:41:00 flash kernel: [387803.453414] Apr 26 12:41:00 flash kernel: [387803.453418] Pid: 28185, comm: mount Tainted: P 3.1.10-22-default #1 Apr 26 12:41:00 flash kernel: [387803.453421] Call Trace: Apr 26 12:41:00 flash kernel: [387803.453436] [<ffffffff810042fa>] dump_trace+0x9a/0x270 Apr 26 12:41:00 flash kernel: [387803.453443] [<ffffffff815266c3>] dump_stack+0x69/0x6f Apr 26 12:41:00 flash kernel: [387803.453486] [<ffffffffa005b304>] xfs_free_ag_extent+0x564/0x7c0 [xfs] Apr 26 12:41:00 flash kernel: [387803.453562] [<ffffffffa005da7c>] xfs_free_extent+0xec/0x130 [xfs] Apr 26 12:41:00 flash kernel: [387803.453641] [<ffffffffa008b900>] xlog_recover_process_efi+0x160/0x1b0 [xfs] Apr 26 12:41:00 flash kernel: [387803.453763] [<ffffffffa008cbf1>] xlog_recover_process_efis.isra.8+0x61/0xb0 [xfs] Apr 26 12:41:00 flash kernel: [387803.453884] [<ffffffffa00907f0>] xlog_recover_finish+0x20/0xb0 [xfs] Apr 26 12:41:00 flash kernel: [387803.454009] [<ffffffffa009337e>] xfs_mountfs+0x43e/0x6b0 [xfs] Apr 26 12:41:00 flash kernel: [387803.454132] [<ffffffffa00536cd>] xfs_fs_fill_super+0x1bd/0x270 [xfs] Apr 26 12:41:00 flash kernel: [387803.454180] [<ffffffff8114e6a4>] mount_bdev+0x1b4/0x1f0 Apr 26 12:41:00 flash kernel: [387803.454186] [<ffffffff8114ef55>] mount_fs+0x45/0x1d0 Apr 26 12:41:00 flash kernel: [387803.454192] [<ffffffff81167656>] vfs_kern_mount+0x66/0xd0 Apr 26 12:41:00 flash kernel: [387803.454197] [<ffffffff81168a33>] do_kern_mount+0x53/0x120 Apr 26 12:41:00 flash kernel: [387803.454202] [<ffffffff8116a4e5>] do_mount+0x1a5/0x260 Apr 26 12:41:00 flash kernel: [387803.454208] [<ffffffff8116a98a>] sys_mount+0x9a/0xf0 Apr 26 12:41:00 flash kernel: [387803.454214] [<ffffffff81546712>] system_call_fastpath+0x16/0x1b Apr 26 12:41:00 flash kernel: [387803.454222] [<00007f171d3bb97a>] 0x7f171d3bb979 Apr 26 12:41:00 flash kernel: [387803.454230] XFS (sdd1): Failed to recover EFIs Apr 26 12:41:00 flash kernel: [387803.454232] XFS (sdd1): log mount finish failed Apr > >> I have an image of my pre-repaired filesystem by using dd and can try and >> do a meta dump. The filesystem is 1.9TB in size with about 1.2TB of data in >> use. > ISTR that metadump needs the log to be clean first, too. What is ISTR? > Cheers, > > Dave. >> -Aaron > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-04-29 21:55 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-04-26 20:00 Problem recovering XFS filesystem Aaron Williams 2012-04-27 21:31 ` Michael Monnerie 2012-04-28 2:04 ` Aaron Williams 2012-04-29 0:35 ` Dave Chinner 2012-04-29 21:55 ` Aaron Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox