From: Amir Goldstein <amir73il@gmail.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>, linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: xfs clones crash issue - illegal state 13 in block map
Date: Fri, 8 Sep 2017 11:34:58 +0300 [thread overview]
Message-ID: <CAOQ4uxibPmwRS3P5A-k2JQvCX_hq9Mb_cSOThe-qcVd0136PpQ@mail.gmail.com> (raw)
In-Reply-To: <20170907161330.GA6540@magnolia>
[-- Attachment #1: Type: text/plain, Size: 4627 bytes --]
On Thu, Sep 7, 2017 at 7:13 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> On Thu, Sep 07, 2017 at 03:58:56PM +0300, Amir Goldstein wrote:
>> Hi guys,
>>
>> I am getting these errors often when running the crash tests
>> with cloned files (generic/502 in my xfstests patches).
>>
>> Hitting these errors requires first fixing 2 other issues
>> that shadow over this issue:
>> "xfs: fix incorrect log_flushed on fsync" (in master)
>> "xfs: fix leftover CoW extent after truncate"
>> available on my tree based on Darrick's simple fix:
>> https://github.com/amir73il/linux/commits/xfs-fsync
>>
>> I get the errors more often (1 out of 5) on a 100G fs on spinning disk.
>> On a 10G fs on SSD they are less frequent.
>> The log in this email was captured on patched stable 4.9.47 kernel,
>> but I am getting the same errors on patched upstream kernel.
>>
>> I wasn't able to create a deterministic reproducer, so attaching
>> the full log from a failed test along with an IO log that can be
>> replayed on your disk to examine the outcome.
>>
>> Following is the output of fsx process #5, which is the process
>> that wrote the problematic testfile5.mark0 to the log.
>> This process performs only read,zero,fsync before creating
>> the log mark.
>> The file testfile5 was cloned from an origin 256K file before
>> running fsx.
>> Later, I used the random seed 35484 in this log for all
>> processes and it seemed to increase the probability for failure.
>>
>> # /old/home/amir/src/xfstests-dev/ltp/fsx -N 100 -d -k -P
>> /mnt/test/fsxtests -i /dev/mapper/logwrites-test -S 0 -j 5
>> /mnt/scratch/testfile5
>> Seed set to 35484
>> file_size=262144
>> 5: 1 read 0x3f959 thru 0x3ffff (0x6a7 bytes)
>> 5: 2 zero from 0x3307e to 0x34f74, (0x1ef6 bytes)
>> 5: 3 fsync
>> 5: Dumped fsync buffer to testfile5.mark0
>>
>> In order to get to the crash state you need to get my
>> xfstests replay-log patches and replay the attached log
>> on a >= 100G scratch device:
>>
>> # ./src/log-writes/replay-log --log log.xfs.testfile5.mark0 --replay
>> $SCRATCH_DEV --end-mark testfile5.mark0
>> # mount $SCRATCH_DEV $SCRATCH_MNT
>> # umount $SCRATCH_MNT
>> # xfs_repair -n $SCRATCH_DEV
>> Phase 1 - find and verify superblock...
>> Phase 2 - using internal log
>> - zero log...
>> - scan filesystem freespace and inode maps...
>> - found root inode chunk
>> Phase 3 - for each AG...
>> - scan (but don't clear) agi unlinked lists...
>> - process known inodes and perform inode discovery...
>> - agno = 0
>>
>> fatal error -- illegal state 13 in block map 376
>>
>> Can anyone provide some insight?
>
> Looks like I missed a couple of extent states in process_bmbt_reclist_int.
>
> What happens if you add the following (only compile tested) patch to
> xfsprogs?
This is what happens:
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
data fork in regular inode 134 claims CoW block 376
correcting nextents for inode 134
bad data fork in inode 134
would have cleared inode 134
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
unknown block state, ag 0, block 376
unknown block state, ag 1, block 16
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
entry "testfile2" in shortform directory 128 references free inode 134
- agno = 3
would have junked entry "testfile2" in directory inode 128
imap claims in-use inode 134 is free, would correct imap
Missing reverse-mapping record for (0/376) len 1 owner 134 off 19
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
>
> (Normally I'd say send a metadump too for us mere mortals to work with,
> though I'm about to plunge into weddingland so I likely won't be able to
> do much until the 18th.)
>
Attached (used xfs_metadump -ao)
Soon we will all be gods with powers to replay history ;)
> ((Eric: If this doesn't turn out to be a totally garbage patch, feel
> free to add it to xfsprogs.))
>
> --D
>
[-- Attachment #2: metadump.xfs.testfile5.mark0.bz2 --]
[-- Type: application/x-bzip2, Size: 80357 bytes --]
next prev parent reply other threads:[~2017-09-08 8:35 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-07 12:58 xfs clones crash issue - illegal state 13 in block map Amir Goldstein
2017-09-07 16:13 ` Darrick J. Wong
2017-09-08 8:34 ` Amir Goldstein [this message]
2017-09-19 5:38 ` Darrick J. Wong
2017-09-19 6:16 ` Amir Goldstein
2017-10-09 12:48 ` Hou Tao
2017-10-10 19:18 ` Darrick J. Wong
2017-11-22 18:25 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOQ4uxibPmwRS3P5A-k2JQvCX_hq9Mb_cSOThe-qcVd0136PpQ@mail.gmail.com \
--to=amir73il@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).