From: Jay Ashworth <jra@baylink.com>
To: xfs@oss.sgi.com
Subject: Re: XFS filesystem triggering assert in _repair 3.1.11
Date: Sun, 14 Jul 2013 12:14:21 -0400 (EDT) [thread overview]
Message-ID: <11307192.1374.1373818461931.JavaMail.root@benjamin.baylink.com> (raw)
In-Reply-To: <51E2C912.8080305@sandeen.net>
----- Original Message -----
> From: "Eric Sandeen" <sandeen@sandeen.net>
> On 7/13/13 4:29 PM, Jay Ashworth wrote:
> ...
>
> > That's where I am right now: the drive was throwing a kernel oops if
> > I mounted it,
>
> That shouldn't happen, for starters - was this on the older 2.6.37
> kernel?
Correct. It also threw btree errors on that kernel *and* the 3.7 liveCD,
but never oopsed the 3.7.
> > and xfs_repair would just lock up. I had to do a -L on
> > it
>
> ok, so much for debugging the oops ...
Yeah, sorry. Thankfully, it's summer hiatus, but it is a production
box, which sometimes limits how long I can keep problems around before
brute forcing them. I *have* the oops, but no longer the FS that caused
it.
> > after which it would mount and unmount cleanly, and xfs_repair runs
> > and finds problems, but then fails an assert at the end and dies.
> >
> > Here's that entire repair run:
> >
> > =============================================================
> > plaintain:/var/log/mythtv # xfs_repair /dev/sdc2
> > Phase 1 - find and verify superblock...
> > Not enough RAM available for repair to enable prefetching.
>
> ...
>
> > entry "1011_20130509205900.mpg" at block 13 offset 4016 in directory
> > inode 1073789184 references free inode 1137017084
> > clearing inode number in entry at offset 4016...
> > bad back (left) sibling pointer (saw 16140901064495857663 should be
> > NULL (0))
> ^^^ 0xDFFFFFFFFFFFFFFF i.e. -2
>
> #define HOLESTARTBLOCK ((xfs_fsblock_t)-2LL) ?
>
> > in inode 1115989006 (data fork) bmap btree block 107963248
> > xfs_repair: dinode.c:2136: process_inode_data_fork: Assertion `err
> > == 0' failed.
>
> This means we were in the check_dups path, and one of the process_*()
> functions
> failed. Due to that "bad back (left) sibling pointer ..."
>
> If I had time to work on this, I'd ask for an xfs_metadump image of
> the filesystem to be able to reproduce it and look further into the
> problem...
>
> It might shed some light on things to use xfs_db to look at inode
> 1115989006
>
> # xfs_db /dev/sdc2
> xfs_db> inode 1115989006
> xfs_db> p
xfs_db> inode 1115989006
xfs_db> p
core.magic = 0x494e
core.mode = 0100666
core.version = 2
core.format = 3 (btree)
core.nlinkv2 = 1
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 111
core.gid = 33
core.flushiter = 18
core.atime.sec = Wed Jul 3 19:28:22 2013
core.atime.nsec = 956870002
core.mtime.sec = Tue Jan 29 20:00:10 2013
core.mtime.nsec = 466912274
core.ctime.sec = Fri Jul 12 13:37:43 2013
core.ctime.nsec = 217838130
core.size = 916961916
core.nblocks = 223869
core.extsize = 0
core.nextents = 16
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 3501711335
next_unlinked = null
u.bmbt.level = 1
u.bmbt.numrecs = 1
u.bmbt.keys[1] = [startoff] 1:[0]
u.bmbt.ptrs[1] = 1:107963248
> looking at bmap btree block 107963248 might also be interesting; like
> this I think but I'm rusty:
>
> xfs_db> fsblock 107963248
> xfs_db> type bmapbt
Well, the manpage says that's a type, but my xfs_db, v 3.1.11, says it's not. Huh?
> xfs_db> p
>
> > Aborted
> > =============================================================
> >
> > This is xfs_repair 3.1.11, from xfsprogs 3.1.11 from tarball,
> > compiled on
> > the machine in question, which is a 32-bit OS with 512MB of ram (the
> > mobo, an old MSI KT6V, pukes if we try to put more ram on it for
> > some
> > reason). I have run memtest+ on the ram and multiple passes come
> > back clean as a whistle; the SATA controller is a SiI 3114, which we
> > had to buy to talk to the 3TB drives; boot is from the VT6420 on the
> > motherboard and a dedicated 40G Samsung.
> >
> > I have done some work on this repair booted from a Suse 12.1 rescue
> > disk
> > with a 3.7 kernel, on the theory that the XFS drivers in the kernel
> > might help; I found that mounting and unmounting in between multiple
> > repair runs made me have to do less of them -- though I'm sure more
> > than two dirty runs before one sees a clean one ought to be Right
> > Out
> > anyway.
>
> Eek, so you thrashed about, in other words. ;)
I've been at this over a week. Yes, there's been some thrashing. I have a
2TB that I need to dedupe and re-mkfs, so I have space to work on; that
process itself is hanging against a *different* XFS problem on a different
filesystem. (Specifically, I have one bad inode on that FS that repair
doesn't seem to want to touch. It's been lower priority cause that data's
duped, but as I need the free space more, its priority is rising.)
I hate power supplies.
> > I've seen suggestions on the mailing list archives and other places
> > that (some) assertion fails were for things fixed in earlier tools
> > releases, but that one's not helping me...
>
> well, not always true, esp. in userspace.
>
> > I have space to move this data off and remake the filesystem,
> > if I can get it to mount reliably and stay that way long enough.
>
> you can always mount it & copy as much as possible until you hit
> corruption. But until repair succeeds you'll have corruption lurking
> that you'll hit which will probably cause the fs to shut down
> (gracefully, in theory).
Well, the bottom half shuts down, but then the top half keeps going,
throwing error 5's all night.
Cheers,
-- jra
--
Jay R. Ashworth Baylink jra@baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII
St Petersburg FL USA #natog +1 727 647 1274
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-07-14 16:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <18642856.1344.1373749779825.JavaMail.root@benjamin.baylink.com>
2013-07-13 21:29 ` XFS filesystem triggering assert in _repair 3.1.11 Jay Ashworth
2013-07-14 15:51 ` Eric Sandeen
2013-07-14 16:14 ` Jay Ashworth [this message]
2013-07-15 0:07 ` Dave Chinner
2013-07-15 4:41 ` Jay Ashworth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=11307192.1374.1373818461931.JavaMail.root@benjamin.baylink.com \
--to=jra@baylink.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox