linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>, linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: xfs clones crash issue - illegal state 13 in block map
Date: Thu, 7 Sep 2017 09:13:30 -0700	[thread overview]
Message-ID: <20170907161330.GA6540@magnolia> (raw)
In-Reply-To: <CAOQ4uxgCanEpWx20QvvLvwBbjJiJ2U3oz-e1ZXrv3ZV2d3EPRQ@mail.gmail.com>

On Thu, Sep 07, 2017 at 03:58:56PM +0300, Amir Goldstein wrote:
> Hi guys,
> 
> I am getting these errors often when running the crash tests
> with cloned files (generic/502 in my xfstests patches).
> 
> Hitting these errors requires first fixing 2 other issues
> that shadow over this issue:
> "xfs: fix incorrect log_flushed on fsync" (in master)
> "xfs: fix leftover CoW extent after truncate"
> available on my tree based on Darrick's simple fix:
> https://github.com/amir73il/linux/commits/xfs-fsync
> 
> I get the errors more often (1 out of 5) on a 100G fs on spinning disk.
> On a 10G fs on SSD they are less frequent.
> The log in this email was captured on patched stable 4.9.47 kernel,
> but I am getting the same errors on patched upstream kernel.
> 
> I wasn't able to create a deterministic reproducer, so attaching
> the full log from a failed test along with an IO log that can be
> replayed on your disk to examine the outcome.
> 
> Following is the output of fsx process #5, which is the process
> that wrote the problematic testfile5.mark0 to the log.
> This process performs only read,zero,fsync before creating
> the log mark.
> The file testfile5 was cloned from an origin 256K file before
> running fsx.
> Later, I used the random seed 35484 in this log for all
> processes and it seemed to increase the probability for failure.
> 
> # /old/home/amir/src/xfstests-dev/ltp/fsx -N 100 -d -k -P
> /mnt/test/fsxtests -i /dev/mapper/logwrites-test -S 0 -j 5
> /mnt/scratch/testfile5
> Seed set to 35484
> file_size=262144
> 5: 1 read 0x3f959 thru 0x3ffff (0x6a7 bytes)
> 5: 2 zero from 0x3307e to 0x34f74, (0x1ef6 bytes)
> 5: 3 fsync
> 5: Dumped fsync buffer to testfile5.mark0
> 
> In order to get to the crash state you need to get my
> xfstests replay-log patches and replay the attached log
> on a >= 100G scratch device:
> 
> # ./src/log-writes/replay-log --log log.xfs.testfile5.mark0 --replay
> $SCRATCH_DEV --end-mark testfile5.mark0
> # mount $SCRATCH_DEV $SCRATCH_MNT
> # umount $SCRATCH_MNT
> # xfs_repair -n $SCRATCH_DEV
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
> 
> fatal error -- illegal state 13 in block map 376
> 
> Can anyone provide some insight?

Looks like I missed a couple of extent states in process_bmbt_reclist_int.

What happens if you add the following (only compile tested) patch to
xfsprogs?

(Normally I'd say send a metadump too for us mere mortals to work with,
though I'm about to plunge into weddingland so I likely won't be able to
do much until the 18th.)

((Eric: If this doesn't turn out to be a totally garbage patch, feel
free to add it to xfsprogs.))

--D

xfs_repair: handle missing extent states

Missed a couple of the new extent states in the bmbt processing, so add
them to avoid aborting xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/dinode.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/repair/dinode.c b/repair/dinode.c
index f817b5a..b35a523 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -796,6 +796,7 @@ _("%s fork in ino %" PRIu64 " claims free block %" PRIu64 "\n"),
 			case XR_E_FS_MAP:
 			case XR_E_INO:
 			case XR_E_INUSE_FS:
+			case XR_E_REFC:
 				do_warn(
 _("%s fork in inode %" PRIu64 " claims metadata block %" PRIu64 "\n"),
 					forkname, ino, b);
@@ -812,6 +813,12 @@ _("%s fork in %s inode %" PRIu64 " claims used block %" PRIu64 "\n"),
 					forkname, ftype, ino, b);
 				goto done;
 
+			case XR_E_COW:
+				do_warn(
+_("%s fork in %s inode %" PRIu64 " claims CoW block %" PRIu64 "\n"),
+					forkname, ftype, ino, b);
+				goto done;
+
 			default:
 				do_error(
 _("illegal state %d in block map %" PRIu64 "\n"),

  reply	other threads:[~2017-09-07 16:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-07 12:58 xfs clones crash issue - illegal state 13 in block map Amir Goldstein
2017-09-07 16:13 ` Darrick J. Wong [this message]
2017-09-08  8:34   ` Amir Goldstein
2017-09-19  5:38     ` Darrick J. Wong
2017-09-19  6:16       ` Amir Goldstein
2017-10-09 12:48       ` Hou Tao
2017-10-10 19:18         ` Darrick J. Wong
2017-11-22 18:25           ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170907161330.GA6540@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=amir73il@gmail.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).