linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>, linux-xfs@vger.kernel.org
Subject: Re: t_firstblock assert triggered from dedup in generic/051
Date: Wed, 11 Jul 2018 12:43:21 -0700	[thread overview]
Message-ID: <20180711194321.GJ5724@magnolia> (raw)
In-Reply-To: <20180711193821.GA6841@killians.ges.redhat.com>

On Wed, Jul 11, 2018 at 03:38:21PM -0400, Brian Foster wrote:
> On Wed, Jul 11, 2018 at 12:09:25PM -0700, Darrick J. Wong wrote:
> > On Wed, Jul 11, 2018 at 08:42:41PM +0200, Christoph Hellwig wrote:
> > > I ran into this when testing the rebase of my cow optimization
> > > series to the 4.19-merge branch, but from a quick glance it really
> > > looks like the t_firstblock series might be at faul.  Feel free
> > > to discard if you think it is not, I will retest with my patches
> > > tomorrow.
> > > 
> > > [  635.600041] run fstests generic/051 at 2018-07-11 18:25:22
> > > [  635.802443] XFS (vdb): Mounting V5 Filesystem
> > > [  635.814103] XFS (vdb): Ending clean mount
> > > [  635.992757] XFS (vdc): Mounting V5 Filesystem
> > > [  636.000955] XFS (vdc): Ending clean mount
> > > [  636.031583] XFS (vdc): Unmounting Filesystem
> > > [  636.115131] XFS (vdc): Mounting V5 Filesystem
> > > [  636.123271] XFS (vdc): Ending clean mount
> > > [  639.538434] XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
> > > [  639.539893] ------------[ cut here ]------------
> > > [  639.540516] kernel BUG at fs/xfs/xfs_message.c:102!
> > > [  639.541168] invalid opcode: 0000 [#1] SMP PTI
> > > [  639.541814] CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
> > > [  639.542744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
> > > [  639.543848] RIP: 0010:assfail+0x23/0x30
> > > [  639.544470] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> > > [  639.546396] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> > > [  639.546899] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> > > [  639.547582] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> > > [  639.548273] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> > > [  639.548954] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> > > [  639.549704] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> > > [  639.550397] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> > > [  639.551166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  639.551795] CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
> > > [  639.552604] Call Trace:
> > > [  639.552896]  xfs_defer_init+0xff/0x160
> > > [  639.553331]  xfs_reflink_remap_extent+0x31b/0xa00
> > 
> > Hmmm, this is one of those functions that does:
> > 
> > xfs_trans_alloc
> > while(...)
> > 	xfs_defer_init
> > 	...
> > 	xfs_defer_finish
> > xfs_trans_commit
> > 
> > Hmm, we set t_firstblock to NULLFSBLOCK in _trans_alloc and _trans_dup,
> > which should take care of the preconditions and the defer_finis...
> > 
> > ...oh, you know what I think it is?  We don't roll the transaction after
> > successfully finishing the while (xfs_defer_has_unfinished_work) loop in
> > _defer_finish, which means that t_firstblock can be non-null at the
> > bottom of that loop, which means that we then trip the assert at the top
> > of the loop, just like you saw.
> > 
> > An extra downside is that _defer_finish returns with a dirty transaction,
> > which means that whatever happens afterwards can (in theory) exceed the
> > transaction reservation and blow up.  Yikes.
> > 
> 
> My first thought was whether we could get through that loop with an
> allocation or something but without deferring anything, but what you
> describe above seems much more likely. Given the next series of
> refactoring buries the final defer_finish() in trans_commit(), I'm
> wondering if we should just have a defer_finish() wrapper that rolls the
> tx a final time, which we can bypass in the trans_commit() case because
> we aren't making any more changes to the transaction. Hm?

That sounds like a reasonable approach.  In the meantime I'm testing a
patch (below) to fix(?) the problem on 4.9 - current kernels.  If it
tests ok I'll send it for reals to the list.

--D

Christoph Hellwig reported seeing the following assertion in
generic/051:

XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:102!
invalid opcode: 0000 [#1] SMP PTI
CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
RIP: 0010:assfail+0x23/0x30
Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
Call Trace:
 xfs_defer_init+0xff/0x160
 xfs_reflink_remap_extent+0x31b/0xa00
 xfs_reflink_remap_blocks+0xec/0x4a0
 xfs_reflink_remap_range+0x3a1/0x650
 xfs_file_dedupe_range+0x39/0x50
 vfs_dedupe_file_range+0x218/0x260
 do_vfs_ioctl+0x262/0x6a0
 ? __se_sys_newfstat+0x3c/0x60
 ksys_ioctl+0x35/0x60
 __x64_sys_ioctl+0x11/0x20
 do_syscall_64+0x4b/0x190
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The root cause of the assertion failure is that xfs_defer_finish doesn't
roll the transaction after processing all the deferred items.  Therefore
it returns a dirty transaction to the caller, which leaves the caller at
risk of exceeding the transaction reservation if it logs more items.

Brian Foster's patchset to move the defer_ops firstblock into the
transaction requires t_firstblock == NULLFSBLOCK upon defer_ops
initialization, which is how this was noticed at all.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 2713e2d808a7..9bab0fb23761 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -424,6 +424,11 @@ xfs_defer_finish(
 			cleanup_fn(*tp, state, error);
 	}
 
+	/*
+	 * Roll the transaction once more to avoid returning to the caller
+	 * with a dirty transaction.
+	 */
+	error = xfs_defer_trans_roll(tp, dop);
 out:
 	(*tp)->t_dfops = orig_dop;
 	if (error)

  reply	other threads:[~2018-07-11 19:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 18:42 t_firstblock assert triggered from dedup in generic/051 Christoph Hellwig
2018-07-11 19:09 ` Darrick J. Wong
2018-07-11 19:38   ` Brian Foster
2018-07-11 19:43     ` Darrick J. Wong [this message]
2018-07-12  0:25       ` Darrick J. Wong
2018-07-12 12:06         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180711194321.GJ5724@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).