From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>, linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: xfs clones crash issue - illegal state 13 in block map
Date: Mon, 18 Sep 2017 22:38:43 -0700 [thread overview]
Message-ID: <20170919053843.GA7117@magnolia> (raw)
In-Reply-To: <CAOQ4uxibPmwRS3P5A-k2JQvCX_hq9Mb_cSOThe-qcVd0136PpQ@mail.gmail.com>
On Fri, Sep 08, 2017 at 11:34:58AM +0300, Amir Goldstein wrote:
> On Thu, Sep 7, 2017 at 7:13 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > On Thu, Sep 07, 2017 at 03:58:56PM +0300, Amir Goldstein wrote:
> >> Hi guys,
> >>
> >> I am getting these errors often when running the crash tests
> >> with cloned files (generic/502 in my xfstests patches).
> >>
> >> Hitting these errors requires first fixing 2 other issues
> >> that shadow over this issue:
> >> "xfs: fix incorrect log_flushed on fsync" (in master)
> >> "xfs: fix leftover CoW extent after truncate"
> >> available on my tree based on Darrick's simple fix:
> >> https://github.com/amir73il/linux/commits/xfs-fsync
> >>
> >> I get the errors more often (1 out of 5) on a 100G fs on spinning disk.
> >> On a 10G fs on SSD they are less frequent.
> >> The log in this email was captured on patched stable 4.9.47 kernel,
> >> but I am getting the same errors on patched upstream kernel.
> >>
> >> I wasn't able to create a deterministic reproducer, so attaching
> >> the full log from a failed test along with an IO log that can be
> >> replayed on your disk to examine the outcome.
> >>
> >> Following is the output of fsx process #5, which is the process
> >> that wrote the problematic testfile5.mark0 to the log.
> >> This process performs only read,zero,fsync before creating
> >> the log mark.
> >> The file testfile5 was cloned from an origin 256K file before
> >> running fsx.
> >> Later, I used the random seed 35484 in this log for all
> >> processes and it seemed to increase the probability for failure.
> >>
> >> # /old/home/amir/src/xfstests-dev/ltp/fsx -N 100 -d -k -P
> >> /mnt/test/fsxtests -i /dev/mapper/logwrites-test -S 0 -j 5
> >> /mnt/scratch/testfile5
> >> Seed set to 35484
> >> file_size=262144
> >> 5: 1 read 0x3f959 thru 0x3ffff (0x6a7 bytes)
> >> 5: 2 zero from 0x3307e to 0x34f74, (0x1ef6 bytes)
> >> 5: 3 fsync
> >> 5: Dumped fsync buffer to testfile5.mark0
> >>
> >> In order to get to the crash state you need to get my
> >> xfstests replay-log patches and replay the attached log
> >> on a >= 100G scratch device:
> >>
> >> # ./src/log-writes/replay-log --log log.xfs.testfile5.mark0 --replay
> >> $SCRATCH_DEV --end-mark testfile5.mark0
> >> # mount $SCRATCH_DEV $SCRATCH_MNT
> >> # umount $SCRATCH_MNT
> >> # xfs_repair -n $SCRATCH_DEV
> >> Phase 1 - find and verify superblock...
> >> Phase 2 - using internal log
> >> - zero log...
> >> - scan filesystem freespace and inode maps...
> >> - found root inode chunk
> >> Phase 3 - for each AG...
> >> - scan (but don't clear) agi unlinked lists...
> >> - process known inodes and perform inode discovery...
> >> - agno = 0
> >>
> >> fatal error -- illegal state 13 in block map 376
> >>
> >> Can anyone provide some insight?
> >
> > Looks like I missed a couple of extent states in process_bmbt_reclist_int.
> >
> > What happens if you add the following (only compile tested) patch to
> > xfsprogs?
>
> This is what happens:
>
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan (but don't clear) agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> data fork in regular inode 134 claims CoW block 376
> correcting nextents for inode 134
> bad data fork in inode 134
> would have cleared inode 134
> - agno = 1
> - agno = 2
> - agno = 3
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> unknown block state, ag 0, block 376
> unknown block state, ag 1, block 16
> - check for inodes claiming duplicate blocks...
> - agno = 0
> - agno = 1
> - agno = 2
> entry "testfile2" in shortform directory 128 references free inode 134
> - agno = 3
> would have junked entry "testfile2" in directory inode 128
> imap claims in-use inode 134 is free, would correct imap
> Missing reverse-mapping record for (0/376) len 1 owner 134 off 19
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
> - traversing filesystem ...
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
Hmm.
# xfs_db bugs.img
xfs_db> inode 134
xfs_db> bmap
data offset 0 startblock 24 (0/24) count 19 flag 0
data offset 19 startblock 376 (0/376) count 1 flag 0
data offset 25 startblock 8388624 (1/16) count 4 flag 1
data offset 29 startblock 53 (0/53) count 35 flag 0
Ok, so inode 134 thinks it owns block 376.
xfs_db> agf 0
xfs_db> addr refcntroot
xfs_db> p
...
recs[1-22] = [startblock,blockcount,refcount,cowflag]
1:[24,1,11,0]
...
22:[87,1,6,0]
xfs_db>
Ok, so the refcountbt doesn't know about block 376, so it's neither
shared nor a cow staging extent.
xfs_db> agf 0
xfs_db> addr rmaproot
xfs_db> p
...
recs[1-38] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock]
1:[0,1,-3,0,0,0,0]
...
38:[376,1,-9,0,0,0,0]
xfs_db>
So it seems that the rmapbt is behind the times, since record 38 should
be [376,1,134,19,0,0,0]. Either that's because there's something wrong
with the regular deferred ops mechanism, or log recovery of deferred ops.
Let's have a look at the log...
# xfs_logprint bugs.img | grep x178 -B3
Oper (426): tid: 5c75a3f len: 48 clientid: TRANS flags: none
CUI: #regs: 1 num_extents: 2 id: 0xffff8f0e842f74b8
(s: 0x2b, l: 1, f: 0x2)
(s: 0x178, l: 1, f: 0x4)
...ok, so we XFS_REFCOUNT_FREE_COW block 376, which removes it from from
the refcountbt...
----------------------------------------------------------------------------
Oper (427): tid: 5c75a3f len: 48 clientid: TRANS flags: none
BUI: #regs: 1 num_extents: 1 id: 0xffff8f0e81f39000
(s: 0x178, l: 1, own: 134, off: 19, f: 0x1)
...now we map block 376 into the data fork of inode 134 at offset 19...
--
----------------------------------------------------------------------------
Oper (447): tid: 5c75a3f len: 48 clientid: TRANS flags: none
RUI: #regs: 1 num_extents: 1 id: 0xffff8f0e48254000
(s: 0x178, l: 1, own: -9, off: 0, f: 0x8)
...now we XFS_RMAP_EXTENT_FREE the CoW extent @ block 376 from the rmap,
which ought to remove the record...
--
----------------------------------------------------------------------------
Oper (453): tid: e0ecea6a len: 48 clientid: TRANS flags: none
RUI: #regs: 1 num_extents: 1 id: 0xffff8f0e48254530
(s: 0x178, l: 1, own: 134, off: 19, f: 0x2)
...and XFS_RMAP_EXTENT_MAP_SHARED a new rmapping for block 376, which
should add the [376,1,134,19,0,0,0] that we'd expect to be there...
--
----------------------------------------------------------------------------
Oper (13): tid: 17c5790c len: 48 clientid: TRANS flags: none
RUI: #regs: 1 num_extents: 1 id: 0xffff925955341f20
(s: 0x178, l: 1, own: 134, off: 19, f: 0x2)
...strange, another rmap add? Did the RUIs fail to complete?
# xfs_logprint bugs.img | egrep '(0xffff8f0e48254530|0xffff925955341f20)'
RUI: #regs: 1 num_extents: 1 id: 0xffff8f0e48254530
RUD: #regs: 1 id: 0xffff8f0e48254530
RUI: #regs: 1 num_extents: 1 id: 0xffff925955341f20
RUD: #regs: 1 id: 0xffff925955341f20
...the first RUI is logged as operation 453 in tid e0ecea6a, and it has
an RUD at 460. However, I don't see a COMMIT, so I gather that log
recovery tried to replay the RUI as operation 13 in tid 17c5790c.
----------------------------------------------------------------------------
Oper (13): tid: 17c5790c len: 48 clientid: TRANS flags: none
RUI: #regs: 1 num_extents: 1 id: 0xffff925955341f20
(s: 0x178, l: 1, own: 134, off: 19, f: 0x2)
----------------------------------------------------------------------------
Oper (14): tid: 17c5790c len: 16 clientid: TRANS flags: none
RUD: #regs: 1 id: 0xffff925955341f20
----------------------------------------------------------------------------
Oper (15): tid: 17c5790c len: 16 clientid: TRANS flags: none
RUD: #regs: 1 id: 0xffff8f0e48254000
----------------------------------------------------------------------------
Oper (16): tid: 17c5790c len: 24 clientid: TRANS flags: none
BUF: #regs: 3 start blkno: 40 (0x28) len: 8 bmap size: 1 flags: 0x2000
Oper (17): tid: 17c5790c len: 128 clientid: TRANS flags: none
BUF DATA
0 33424d52 31000000 ffffffff ffffffff 0 28000000 1000000 38000000
8 3674196c 564cfddf 7798e496 591b6410 0 42249385 0 1000000
10 ffffffff fdffffff 0 0 1000000 2000000 ffffffff fbffffff
18 0 0 3000000 2000000 ffffffff faffffff 0 0
Oper (18): tid: 17c5790c len: 128 clientid: TRANS flags: none
BUF DATA
0 0 0 4d010000 9000000 0 8a000000 0 2a000000
8 56010000 d000000 ffffffff f7ffffff 0 0 65010000 6b000000
10 ffffffff f7ffffff 0 0 78010000 1000000 0 86000000
18 0 13000000 0 0 0 0 0 0
----------------------------------------------------------------------------
Oper (19): tid: 17c5790c len: 0 clientid: TRANS flags: COMMIT
Ok, so those last two lines in operation 16 show that we've indeed added the
new record to the rmapbt... <shrug> <confused> <tired>??
Are we sure that the log recovery actions actually get checkpointed
correctly? And just to check my own sanity, we generate a snapshot and
then recover it and run xfs_repair, right?
--D
>
> >
> > (Normally I'd say send a metadump too for us mere mortals to work with,
> > though I'm about to plunge into weddingland so I likely won't be able to
> > do much until the 18th.)
> >
>
> Attached (used xfs_metadump -ao)
> Soon we will all be gods with powers to replay history ;)
>
> > ((Eric: If this doesn't turn out to be a totally garbage patch, feel
> > free to add it to xfsprogs.))
> >
> > --D
> >
next prev parent reply other threads:[~2017-09-19 5:38 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-07 12:58 xfs clones crash issue - illegal state 13 in block map Amir Goldstein
2017-09-07 16:13 ` Darrick J. Wong
2017-09-08 8:34 ` Amir Goldstein
2017-09-19 5:38 ` Darrick J. Wong [this message]
2017-09-19 6:16 ` Amir Goldstein
2017-10-09 12:48 ` Hou Tao
2017-10-10 19:18 ` Darrick J. Wong
2017-11-22 18:25 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170919053843.GA7117@magnolia \
--to=darrick.wong@oracle.com \
--cc=amir73il@gmail.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).