From: Ben Myers <bpm@sgi.com>
To: "Mears, Morgan" <Morgan.Mears@netapp.com>
Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>
Subject: Re: Possible XFS bug encountered in 3.14.0-rc3+
Date: Thu, 13 Mar 2014 10:31:48 -0500 [thread overview]
Message-ID: <20140313153148.GV1935@sgi.com> (raw)
In-Reply-To: <33A0129EBFD46748804DE81B354CA1B21C0DDBAE@SACEXCMBX06-PRD.hq.netapp.com>
On Thu, Mar 13, 2014 at 02:47:58PM +0000, Mears, Morgan wrote:
> On Wed, Mar 12, 2014 at 07:03:14PM -0400, Dave Chinner wrote:
> > On Wed, Mar 12, 2014 at 08:14:32PM +0000, Mears, Morgan wrote:
> >> Hi,
> >>
> >> Please CC me on any responses; I don't subscribe to this list.
> >>
> >> I ran into a possible XFS bug while doing some Oracle benchmarking. My test
> >> system is running a 3.14.0-rc3+ kernel built from the for-next branch of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
> >> on 2014-02-19 (last commit 1342f11e713792e53e4b7aa21167fe9caca81c4a).
> >>
> >> The XFS instance in question is 200 GB and should have all default
> >> parameters (mkfs.xfs /dev/mapper/<my_lun_partition>). It contains Oracle
> >> binaries and trace files. At the time the issue occurred I had been
> >> running Oracle with SQL*NET server tracing enabled. The affected XFS
> >> had filled up 100% with trace files several times; I was periodically
> >> executing rm -f * in the trace file directory, which would reduce the
> >> file system occupancy from 100% to 3%. I had an Oracle load generating
> >> tool running, so new log files were being created with some frequency.
> >>
> >> The issue occurred during one of my rm -f * executions; afterwards the
> >> file system would only produce errors. Here is the traceback:
> >>
> >> [1552067.297192] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file fs/xfs/xfs_alloc.c. Caller 0xffffffffa04c4905
> >
> > So, freeing a range that is already partially free. The problem
> > appears to be in AG 15, according to the repair output.
> >
> >> https://dl.dropboxusercontent.com/u/31522929/xfs-double-free-xfs_metadump-before-repair.gz
> >
> > AGF 15 is full:
> >
> > xfs_db> agf 15
> > xfs_db> p
> > magicnum = 0x58414746
> > versionnum = 1
> > seqno = 15
> > length = 3276783
> > bnoroot = 1
> > cntroot = 2
> > bnolevel = 1
> > cntlevel = 1
> > flfirst = 0
> > fllast = 3
> > flcount = 4
> > freeblks = 1
> > longest = 1
> > btreeblks = 0
> > uuid = 00000000-0000-0000-0000-000000000000
> > lsn = 0
> > crc = 0
> >
> > And the one free block (other than the minimum 4 on teh AGFL) is:
> >
> > xfs_db> p
> > magic = 0x41425442
> > level = 0
> > numrecs = 1
> > leftsib = null
> > rightsib = null
> > recs[1] = [startblock,blockcount] 1:[3119876,1]
> >
> > But:
> >
> > data fork in ino 940862056 claims dup extent, off - 11, start - 58836692, cnt 1
> > correcting nextents for inode 940862056
> > bad data fork in inode 940862056
> > would have cleared inode 940862056
> >
> > the block number here is in AG 14, which has much more free space:
> >
> > xfs_db> p
> > magicnum = 0x58414746
> > versionnum = 1
> > seqno = 14
> > length = 3276783
> > bnoroot = 1
> > cntroot = 2
> > bnolevel = 1
> > cntlevel = 1
> > flfirst = 42
> > fllast = 45
> > flcount = 4
> > freeblks = 2092022
> > longest = 2078090
> > btreeblks = 0
> >
> > which is in 2 extents:
> >
> > xfs_db> a bnoroot
> > xfs_db> p
> > magic = 0x41425442
> > level = 0
> > numrecs = 2
> > leftsib = null
> > rightsib = null
> > recs[1-2] = [startblock,blockcount] 1:[102466,13932] 2:[116476,2078090]
> > xfs_db> convert agno 14 agbno 102466 fsb
> > 0x3819042 (58822722)
> > xfs_db> convert agno 14 agbno 116476 fsb
> > 0x381c6fc (58836732)
> >
> > and so 58836692 is just short of the second free space. Looking at
> > all the other dup extent claims, they a remostly adjacent to the
> > left edge of these two free spaces. No surprise - that's the way
> > allocation occurs.
> >
> > So, we've got a state where the allocation btree contains a
> > corruption, so a shutdown occurs. The log has captured that
> > corruption when it was made, so log recovery reintroduces that
> > corruption. And so when the extent is freed after log recovery, the
> > corruption is tripped over again.
> >
> > There's two checkpoints in the log, both very small. The last
> > modification to AGI 14 is there before it:
> >
> > BUF: cnt:2 total:2 a:0x669350 len:24 a:0x6693d0 len:128
> > BUF: #regs:2 start blkno:0x15dff891 len:1 bmap size:1 flags:0x2800
> > AGF Buffer: (XAGF)
> > ver:1 seq#:14 len:3276783
> > root BNO:1 CNT:2
> > level BNO:1 CNT:1
> > 1st:42 last:45 cnt:4 freeblks:2092020 longest:2078090
> >
> > As is the freespace btree buffer modification:
> >
> > BUF: cnt:2 total:2 a:0x669460 len:24 a:0x6694e0 len:128
> > BUF: #regs:2 start blkno:0x15dff898 len:8 bmap size:1 flags:0x2000
> > BUF DATA
> > 0 42544241 4000000 ffffffff ffffffff cb620100 1000000 fc7c0100 2000000
> > magic = BTBA
> > level = 0
> > numrecs = 4
> > leftsib = NULLFSBLOCK
> > rightsib = NULLFSBLOCK
> > rec[0] = 0x162cb, 1 (90827,1)
> > rec[1] = 0x17cfc, 2 (97532,1)
> > 8 47900100 67360000 fcc60100 8ab51f00 71ff3100 5000000 71ff3100 5000000
> > rec[2] = 0x19047, 0x3667 (102471,13927)
> > rec[3] = 0x1c6fc, 0x1fb58a (116476,2078090)
> >
> > so, from the pre-recovery case above, we've got two new freespaces
> > in rec[0-1], rec[2] has 5 blocks removed from the left edge, and
> > rec[3] is unchanged.
> >
> > Confirming the ABTC buffer contains the same extents:
> >
> > BUF: cnt:2 total:2 a:0x669570 len:24 a:0x6695f0 len:128
> > BUF: #regs:2 start blkno:0x15dff8a0 len:8 bmap size:1 flags:0x2000
> > BUF DATA
> > 0 43544241 4000000 ffffffff ffffffff cb620100 1000000 fc7c0100 2000000
> > magic = CTBA
> > level = 0
> > numrecs = 4
> > leftsib = NULLFSBLOCK
> > rightsib = NULLFSBLOCK
> > rec[0] = 0x162cb, 1 (90827,1)
> > rec[1] = 0x17cfc, 2 (97532,1)
> > 8 47900100 67360000 fcc60100 8ab51f00 71ff3100 5000000 71ff3100 5000000
> > rec[2] = 0x19047, 0x3667 (102471,13927)
> > rec[3] = 0x1c6fc, 0x1fb58a (116476,2078090)
> > 10 8ca0100 8000000 71ff3100 5000000 71ff3100 5000000 71ff3100 5000000
> > 18 71ff3100 5000000 71ff3100 5000000 71ff3100 5000000 71ff3100 5000000
> >
> > It does. So the btrees contain consistent information, and so it's
> > unlikely that we have a free space btree corruption in the log. So
> > let's look at what was freed:
> >
> > The EFI/EFDs in the log are:
> >
> > EFI: cnt:1 total:1 a:0x668670 len:32
> > EFI: #regs:1 num_extents:1 id:0xffff881496024af0
> > (s: 0x38162cb, l: 1)
> > EFD: cnt:1 total:1 a:0x6684d0 len:32
> > EFD: #regs: 1 num_extents: 1 id: 0xffff881496024af0
> > ....
> > EFI: cnt:1 total:1 a:0x667da0 len:32
> > EFI: #regs:1 num_extents:1 id:0xffff8814960244b0
> > (s: 0x3817cfc, l: 1)
> > EFD: cnt:1 total:1 a:0x65fbd0 len:32
> > EFD: #regs: 1 num_extents: 1 id: 0xffff8814960244b0
> > ....
> > EFI: cnt:1 total:1 a:0x669250 len:32
> > EFI: #regs:1 num_extents:1 id:0xffff881496024000
> > (s: 0x3817cfd, l: 1)
> > EFD: cnt:1 total:1 a:0x6692d0 len:32
> > EFD: #regs: 1 num_extents: 1 id: 0xffff881496024000
> >
> > These are all in ag#14, and have agbnos of 0x162cb, 0x17cfc and
> > 0x17cfd. This corresponds exactly to the two new records in the AG
> > btree. The one remaining EFI doesn't have an EFD in th elog, so this
> > is what is being replayed:
> >
> > EFI: cnt:1 total:1 a:0x669f40 len:32
> > EFI: #regs:1 num_extents:1 id:0xffff881496024640
> > (s: 0x3869ff9, l: 1)
> >
> > Which is at agbno 0x69ff9 (434169). That's off to the right of the
> > start of the last freespace range in rec[3]. rec[3] starts at
> > 0x1c6fc and ends at 0x1fb58a. That means this efi clearly lands
> > inside that freespace range - it's not an edge case, it's deep in
> > the interior of the freespace range.
> >
> > The inode logged just before the EFI - likely it's owner:
> >
> > INO: cnt:2 total:2 a:0x668040 len:56 a:0x667fb0 len:96
> > INODE: #regs:2 ino:0x3862d698 flags:0x1 dsize:0
> > CORE inode:
> > magic:IN mode:0x81b0 ver:2 format:2 onlink:0
> > uid:1001 gid:1001 nlink:0 projid:0
> > atime:1394495104 mtime:1394495104 ctime:1394554526
> > flushiter:1
> > size:0x0 nblks:0x0 exsize:0 nextents:0 anextents:0
> > forkoff:13 dmevmask:0x0 dmstate:0 flags:0x0 gen:-121998876
> >
> > Is an unlinked inode that has had all it's block removed. Yup - it's
> > on the unlinked list:
> >
> > agi unlinked bucket 24 is 6477464 in ag 14 (inode=946001560)
> >
> > So, prior to recovery, what did it contain? it's got 287 bytes of
> > date, and a single extent:
> >
> > u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,59154425,1,0]
> >
> > xfs_db> convert fsb 59154425 agno
> > 0xe (14)
> > xfs_db> convert fsb 59154425 agbno
> > 0x69ff9 (434169)
> >
> > Ok, so the corruption, whatever it was, happened a long time ago,
> > and it's only when removing the file that it was tripped over.
> > There's nothing more I can really get from this - the root cause of
> > the corruption is long gone.
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
>
> Thanks Dave.
>
> Upon restarting my testing I immediately hit this error again (or a very
> similar one in any case). I suspect that the corruption you've noted was
> not properly repaired by xfs_repair.
There are some kinds of corruption that xfs_repair was finding but not
repairing until commit ea4a8de1e1. I suggest you upgrade if you don't have
this commit. How long ago did you make this filesystem?
Thanks,
Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-03-13 15:31 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-12 20:14 Possible XFS bug encountered in 3.14.0-rc3+ Mears, Morgan
2014-03-12 21:43 ` Mark Tinguely
2014-03-12 23:06 ` Dave Chinner
2014-03-13 15:03 ` Mark Tinguely
2014-03-12 23:02 ` Dave Chinner
2014-03-13 14:47 ` Mears, Morgan
2014-03-13 15:31 ` Ben Myers [this message]
2014-03-13 16:56 ` Mears, Morgan
2014-03-13 22:58 ` Dave Chinner
2014-03-14 14:22 ` Mears, Morgan
2014-03-24 21:36 ` Mark Tinguely
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140313153148.GV1935@sgi.com \
--to=bpm@sgi.com \
--cc=Morgan.Mears@netapp.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).