From: Eric Sandeen <sandeen@sandeen.net>
To: Alberto Accomazzi <aaccomazzi@gmail.com>
Cc: xfs@oss.sgi.com
Subject: Re: help with xfs_repair on 10TB fs
Date: Sat, 17 Jan 2009 11:33:33 -0600 [thread overview]
Message-ID: <4972166D.5000006@sandeen.net> (raw)
In-Reply-To: <adcf4ef70901170913l693376d7s6fd0395e2c88e10@mail.gmail.com>
Alberto Accomazzi wrote:
> I need some help with figuring out how to repair a large XFS
> filesystem (10TB of data, 100+ million files). xfs_repair seems to
> have crapped out before finishing the job and now I'm not sure how to
> proceed.
>
> The system is a CentOS 5.2 storage server with a 3ware controller and
> 16 x 1TB drives, 32GB RAM and 64GB swap. After clearing the issues
> with bad blocks on the disks, yesterday we set out to fix the
> filesystem. This is the list of relevant packages that yum reports
> installed:
>
> kmod-xfs.x86_64 0.4-1.2.6.18_53.1.14.e installed
> kmod-xfs.x86_64 0.4-2 installed
> kmod-xfs.x86_64 0.4-1.2.6.18_92.1.10.e installed
> xfsdump.x86_64 2.2.46-1.el5.centos installed
> xfsprogs.x86_64 2.9.4-1.el5.centos installed
> xfsprogs-devel.x86_64 2.9.4-1.el5.centos installed
> kernel.x86_64 2.6.18-92.1.13.el5.cen installed
How did it "crap out?"
You could pretty easily run the very latest xfsprogs here by rebuilding
the src.rpm from
http://kojipkgs.fedoraproject.org/packages/xfsprogs/2.10.2/3.fc11/src/
> After bringing the system back, a mount of the fs reported problems:
>
> Starting XFS recovery on filesystem: sdb1 (logdev: internal)
> Filesystem "sdb1": XFS internal error xfs_btree_check_sblock at line 334 of file
> /home/buildsvn/rpmbuild/BUILD/xfs-kmod-0.4/_kmod_build_/xfs_btree.c. Caller 0x
> ffffffff882fa8d2
so log replay is failing now; but that indicates an unclean shutdown.
Something else must have happened between the xfs_repair and this mount
instance?
> Call Trace:
> [<ffffffff882eacc9>] :xfs:xfs_btree_check_sblock+0xbc/0xcb
> .....
>
> An xfs_check on the device suggests how to solve the problem:
>
> alberto@adsduo-54: sudo xfs_check /dev/sdb1
> ERROR: The filesystem has valuable metadata changes in a log which needs to
> be replayed. Mount the filesystem to replay the log, and unmount it before
> re-running xfs_check. If you are unable to mount the filesystem, then use
> the xfs_repair -L option to destroy the log and attempt a repair.
> Note that destroying the log may cause corruption -- please attempt a mount
> of the filesystem before doing this.
Just means that you have a dirty log.
> xfs_info reports the following for the filesystem:
>
> meta-data=/dev/sdb1 isize=256 agcount=32, agsize=98361855 blks
> = sectsz=512 attr=0
> data = bsize=4096 blocks=3147579360, imaxpct=25
> = sunit=0 swidth=0 blks, unwritten=1
> naming =version 2 bsize=4096
> log =internal bsize=4096 blocks=32768, version=1
> = sectsz=512 sunit=0 blks, lazy-count=0
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> So last night I started an "xfs_repair -L" on the device, which
> proceeded through step 6 before quitting at some point in the middle
> of the night without giving me many clues ast to what went wrong. I
> know that this process uses a ton of memory so we loaded the server
> with 32GB of RAM (the swap file is 64GB) and before goint to sleep I
> noticed that the xfs_repair was using about 24GB of RAM. I put the
> complete log of xfs_repair online at:
> http://www.cfa.harvard.edu/~alberto/ads/xfs_repair.log
wow, that's messy
> bad hash table for directory inode 58134992 (no data entry): rebuilding
> rebuilding directory inode 58134992
> rebuilding directory inode 58345355
> rebuilding directory inode 60221905
>
> So I'm lead to believe that xfs_repair died before completing the job.
> Should I try again? Does anyone have an idea why this might have
> happened? Is it possible that we still don't have enough memory in
> the system for xfs_repair to do the job? Also, it's not clear to me
> how xfs_repair works. Assuming we won't be able to get it to complete
> all of its steps, has it in fact repaired the filesystem somewhat or
> are all the changes mentioned while it runs not committed to the
> filesystem until the end of the run?
I don't see any evidence of it dying in the logs; either it looks like
it's still progressing, or it's stuck.
> For lack of better ideas I'm running an xfs_check at the moment. It's
> been running for close to an hour and has used almost 29GB of memory
> so far. No errors reported.
xfs_check doesn't actually repair anything, just FWIW.
I'd rebuild the srpm I mentioned above and give xfs_repair another shot
with that newer version, at this point.
-Eric
> TIA,
>
> -- Alberto
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2009-01-17 17:33 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-17 17:13 help with xfs_repair on 10TB fs Alberto Accomazzi
2009-01-17 17:33 ` Eric Sandeen [this message]
2009-01-17 18:42 ` Alberto Accomazzi
2009-01-17 18:50 ` Eric Sandeen
2009-01-17 23:14 ` Alberto Accomazzi
2009-01-17 23:49 ` Eric Sandeen
2009-01-18 20:34 ` Alberto Accomazzi
2009-01-17 17:35 ` Tru Huynh
2009-01-17 18:45 ` Alberto Accomazzi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4972166D.5000006@sandeen.net \
--to=sandeen@sandeen.net \
--cc=aaccomazzi@gmail.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox