t_firstblock assert triggered from dedup in generic/051

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* t_firstblock assert triggered from dedup in generic/051
@ 2018-07-11 18:42 Christoph Hellwig
  2018-07-11 19:09 ` Darrick J. Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2018-07-11 18:42 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

I ran into this when testing the rebase of my cow optimization
series to the 4.19-merge branch, but from a quick glance it really
looks like the t_firstblock series might be at faul.  Feel free
to discard if you think it is not, I will retest with my patches
tomorrow.

[  635.600041] run fstests generic/051 at 2018-07-11 18:25:22
[  635.802443] XFS (vdb): Mounting V5 Filesystem
[  635.814103] XFS (vdb): Ending clean mount
[  635.992757] XFS (vdc): Mounting V5 Filesystem
[  636.000955] XFS (vdc): Ending clean mount
[  636.031583] XFS (vdc): Unmounting Filesystem
[  636.115131] XFS (vdc): Mounting V5 Filesystem
[  636.123271] XFS (vdc): Ending clean mount
[  639.538434] XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
[  639.539893] ------------[ cut here ]------------
[  639.540516] kernel BUG at fs/xfs/xfs_message.c:102!
[  639.541168] invalid opcode: 0000 [#1] SMP PTI
[  639.541814] CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
[  639.542744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
[  639.543848] RIP: 0010:assfail+0x23/0x30
[  639.544470] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
[  639.546396] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
[  639.546899] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
[  639.547582] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
[  639.548273] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
[  639.548954] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
[  639.549704] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
[  639.550397] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
[  639.551166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  639.551795] CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
[  639.552604] Call Trace:
[  639.552896]  xfs_defer_init+0xff/0x160
[  639.553331]  xfs_reflink_remap_extent+0x31b/0xa00
[  639.553887]  xfs_reflink_remap_blocks+0xec/0x4a0
[  639.554594]  xfs_reflink_remap_range+0x3a1/0x650
[  639.555090]  xfs_file_dedupe_range+0x39/0x50
[  639.555507]  vfs_dedupe_file_range+0x218/0x260
[  639.555943]  do_vfs_ioctl+0x262/0x6a0
[  639.556304]  ? __se_sys_newfstat+0x3c/0x60
[  639.556698]  ksys_ioctl+0x35/0x60
[  639.557024]  __x64_sys_ioctl+0x11/0x20
[  639.557392]  do_syscall_64+0x4b/0x190
[  639.557752]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  639.558271] RIP: 0033:0x7f8d67ec9e07
[  639.558621] Code: 00 00 00 48 8b 05 91 80 2b 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff  
[  639.560447] RSP: 002b:00007ffd6f26e8c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  639.561158] RAX: ffffffffffffffda RBX: 0000000000000044 RCX: 00007f8d67ec9e07
[  639.561807] RDX: 000055f7e6801970 RSI: 00000000c0189436 RDI: 0000000000000003
[  639.562453] RBP: 000000000000000b R08: 000055f7e67ded60 R09: 000055f7e67ded4c
[  639.563098] R10: 0000000000000078 R11: 0000000000000246 R12: 0000000000017000
[  639.563740] R13: 000055f7e67dec30 R14: 000055f7e681d118 R15: 0000000000000044
[  639.564436] Modules linked in:
[  639.564929] ---[ end trace 67fc0dc4de4b6f64 ]---
[  639.565471] RIP: 0010:assfail+0x23/0x30
[  639.565947] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
[  639.568255] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
[  639.568823] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
[  639.569604] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
[  639.570369] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
[  639.571161] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
[  639.571928] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
[  639.572678] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: t_firstblock assert triggered from dedup in generic/051
  2018-07-11 18:42 t_firstblock assert triggered from dedup in generic/051 Christoph Hellwig
@ 2018-07-11 19:09 ` Darrick J. Wong
  2018-07-11 19:38   ` Brian Foster
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2018-07-11 19:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian Foster, linux-xfs

On Wed, Jul 11, 2018 at 08:42:41PM +0200, Christoph Hellwig wrote:
> I ran into this when testing the rebase of my cow optimization
> series to the 4.19-merge branch, but from a quick glance it really
> looks like the t_firstblock series might be at faul.  Feel free
> to discard if you think it is not, I will retest with my patches
> tomorrow.
> 
> [  635.600041] run fstests generic/051 at 2018-07-11 18:25:22
> [  635.802443] XFS (vdb): Mounting V5 Filesystem
> [  635.814103] XFS (vdb): Ending clean mount
> [  635.992757] XFS (vdc): Mounting V5 Filesystem
> [  636.000955] XFS (vdc): Ending clean mount
> [  636.031583] XFS (vdc): Unmounting Filesystem
> [  636.115131] XFS (vdc): Mounting V5 Filesystem
> [  636.123271] XFS (vdc): Ending clean mount
> [  639.538434] XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
> [  639.539893] ------------[ cut here ]------------
> [  639.540516] kernel BUG at fs/xfs/xfs_message.c:102!
> [  639.541168] invalid opcode: 0000 [#1] SMP PTI
> [  639.541814] CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
> [  639.542744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
> [  639.543848] RIP: 0010:assfail+0x23/0x30
> [  639.544470] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> [  639.546396] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> [  639.546899] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> [  639.547582] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> [  639.548273] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> [  639.548954] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> [  639.549704] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> [  639.550397] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> [  639.551166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  639.551795] CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
> [  639.552604] Call Trace:
> [  639.552896]  xfs_defer_init+0xff/0x160
> [  639.553331]  xfs_reflink_remap_extent+0x31b/0xa00

Hmmm, this is one of those functions that does:

xfs_trans_alloc
while(...)
	xfs_defer_init
	...
	xfs_defer_finish
xfs_trans_commit

Hmm, we set t_firstblock to NULLFSBLOCK in _trans_alloc and _trans_dup,
which should take care of the preconditions and the defer_finis...

...oh, you know what I think it is?  We don't roll the transaction after
successfully finishing the while (xfs_defer_has_unfinished_work) loop in
_defer_finish, which means that t_firstblock can be non-null at the
bottom of that loop, which means that we then trip the assert at the top
of the loop, just like you saw.

An extra downside is that _defer_finish returns with a dirty transaction,
which means that whatever happens afterwards can (in theory) exceed the
transaction reservation and blow up.  Yikes.

--D

> [  639.553887]  xfs_reflink_remap_blocks+0xec/0x4a0
> [  639.554594]  xfs_reflink_remap_range+0x3a1/0x650
> [  639.555090]  xfs_file_dedupe_range+0x39/0x50
> [  639.555507]  vfs_dedupe_file_range+0x218/0x260
> [  639.555943]  do_vfs_ioctl+0x262/0x6a0
> [  639.556304]  ? __se_sys_newfstat+0x3c/0x60
> [  639.556698]  ksys_ioctl+0x35/0x60
> [  639.557024]  __x64_sys_ioctl+0x11/0x20
> [  639.557392]  do_syscall_64+0x4b/0x190
> [  639.557752]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  639.558271] RIP: 0033:0x7f8d67ec9e07
> [  639.558621] Code: 00 00 00 48 8b 05 91 80 2b 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff  
> [  639.560447] RSP: 002b:00007ffd6f26e8c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  639.561158] RAX: ffffffffffffffda RBX: 0000000000000044 RCX: 00007f8d67ec9e07
> [  639.561807] RDX: 000055f7e6801970 RSI: 00000000c0189436 RDI: 0000000000000003
> [  639.562453] RBP: 000000000000000b R08: 000055f7e67ded60 R09: 000055f7e67ded4c
> [  639.563098] R10: 0000000000000078 R11: 0000000000000246 R12: 0000000000017000
> [  639.563740] R13: 000055f7e67dec30 R14: 000055f7e681d118 R15: 0000000000000044
> [  639.564436] Modules linked in:
> [  639.564929] ---[ end trace 67fc0dc4de4b6f64 ]---
> [  639.565471] RIP: 0010:assfail+0x23/0x30
> [  639.565947] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> [  639.568255] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> [  639.568823] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> [  639.569604] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> [  639.570369] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> [  639.571161] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> [  639.571928] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> [  639.572678] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: t_firstblock assert triggered from dedup in generic/051
  2018-07-11 19:09 ` Darrick J. Wong
@ 2018-07-11 19:38   ` Brian Foster
  2018-07-11 19:43     ` Darrick J. Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2018-07-11 19:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Wed, Jul 11, 2018 at 12:09:25PM -0700, Darrick J. Wong wrote:
> On Wed, Jul 11, 2018 at 08:42:41PM +0200, Christoph Hellwig wrote:
> > I ran into this when testing the rebase of my cow optimization
> > series to the 4.19-merge branch, but from a quick glance it really
> > looks like the t_firstblock series might be at faul.  Feel free
> > to discard if you think it is not, I will retest with my patches
> > tomorrow.
> > 
> > [  635.600041] run fstests generic/051 at 2018-07-11 18:25:22
> > [  635.802443] XFS (vdb): Mounting V5 Filesystem
> > [  635.814103] XFS (vdb): Ending clean mount
> > [  635.992757] XFS (vdc): Mounting V5 Filesystem
> > [  636.000955] XFS (vdc): Ending clean mount
> > [  636.031583] XFS (vdc): Unmounting Filesystem
> > [  636.115131] XFS (vdc): Mounting V5 Filesystem
> > [  636.123271] XFS (vdc): Ending clean mount
> > [  639.538434] XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
> > [  639.539893] ------------[ cut here ]------------
> > [  639.540516] kernel BUG at fs/xfs/xfs_message.c:102!
> > [  639.541168] invalid opcode: 0000 [#1] SMP PTI
> > [  639.541814] CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
> > [  639.542744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
> > [  639.543848] RIP: 0010:assfail+0x23/0x30
> > [  639.544470] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> > [  639.546396] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> > [  639.546899] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> > [  639.547582] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> > [  639.548273] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> > [  639.548954] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> > [  639.549704] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> > [  639.550397] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> > [  639.551166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  639.551795] CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
> > [  639.552604] Call Trace:
> > [  639.552896]  xfs_defer_init+0xff/0x160
> > [  639.553331]  xfs_reflink_remap_extent+0x31b/0xa00
> 
> Hmmm, this is one of those functions that does:
> 
> xfs_trans_alloc
> while(...)
> 	xfs_defer_init
> 	...
> 	xfs_defer_finish
> xfs_trans_commit
> 
> Hmm, we set t_firstblock to NULLFSBLOCK in _trans_alloc and _trans_dup,
> which should take care of the preconditions and the defer_finis...
> 
> ...oh, you know what I think it is?  We don't roll the transaction after
> successfully finishing the while (xfs_defer_has_unfinished_work) loop in
> _defer_finish, which means that t_firstblock can be non-null at the
> bottom of that loop, which means that we then trip the assert at the top
> of the loop, just like you saw.
> 
> An extra downside is that _defer_finish returns with a dirty transaction,
> which means that whatever happens afterwards can (in theory) exceed the
> transaction reservation and blow up.  Yikes.
> 

My first thought was whether we could get through that loop with an
allocation or something but without deferring anything, but what you
describe above seems much more likely. Given the next series of
refactoring buries the final defer_finish() in trans_commit(), I'm
wondering if we should just have a defer_finish() wrapper that rolls the
tx a final time, which we can bypass in the trans_commit() case because
we aren't making any more changes to the transaction. Hm?

Brian

> --D
> 
> > [  639.553887]  xfs_reflink_remap_blocks+0xec/0x4a0
> > [  639.554594]  xfs_reflink_remap_range+0x3a1/0x650
> > [  639.555090]  xfs_file_dedupe_range+0x39/0x50
> > [  639.555507]  vfs_dedupe_file_range+0x218/0x260
> > [  639.555943]  do_vfs_ioctl+0x262/0x6a0
> > [  639.556304]  ? __se_sys_newfstat+0x3c/0x60
> > [  639.556698]  ksys_ioctl+0x35/0x60
> > [  639.557024]  __x64_sys_ioctl+0x11/0x20
> > [  639.557392]  do_syscall_64+0x4b/0x190
> > [  639.557752]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [  639.558271] RIP: 0033:0x7f8d67ec9e07
> > [  639.558621] Code: 00 00 00 48 8b 05 91 80 2b 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff  
> > [  639.560447] RSP: 002b:00007ffd6f26e8c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [  639.561158] RAX: ffffffffffffffda RBX: 0000000000000044 RCX: 00007f8d67ec9e07
> > [  639.561807] RDX: 000055f7e6801970 RSI: 00000000c0189436 RDI: 0000000000000003
> > [  639.562453] RBP: 000000000000000b R08: 000055f7e67ded60 R09: 000055f7e67ded4c
> > [  639.563098] R10: 0000000000000078 R11: 0000000000000246 R12: 0000000000017000
> > [  639.563740] R13: 000055f7e67dec30 R14: 000055f7e681d118 R15: 0000000000000044
> > [  639.564436] Modules linked in:
> > [  639.564929] ---[ end trace 67fc0dc4de4b6f64 ]---
> > [  639.565471] RIP: 0010:assfail+0x23/0x30
> > [  639.565947] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> > [  639.568255] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> > [  639.568823] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> > [  639.569604] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> > [  639.570369] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> > [  639.571161] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> > [  639.571928] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> > [  639.572678] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: t_firstblock assert triggered from dedup in generic/051
  2018-07-11 19:38   ` Brian Foster
@ 2018-07-11 19:43     ` Darrick J. Wong
  2018-07-12  0:25       ` Darrick J. Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2018-07-11 19:43 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Wed, Jul 11, 2018 at 03:38:21PM -0400, Brian Foster wrote:
> On Wed, Jul 11, 2018 at 12:09:25PM -0700, Darrick J. Wong wrote:
> > On Wed, Jul 11, 2018 at 08:42:41PM +0200, Christoph Hellwig wrote:
> > > I ran into this when testing the rebase of my cow optimization
> > > series to the 4.19-merge branch, but from a quick glance it really
> > > looks like the t_firstblock series might be at faul.  Feel free
> > > to discard if you think it is not, I will retest with my patches
> > > tomorrow.
> > > 
> > > [  635.600041] run fstests generic/051 at 2018-07-11 18:25:22
> > > [  635.802443] XFS (vdb): Mounting V5 Filesystem
> > > [  635.814103] XFS (vdb): Ending clean mount
> > > [  635.992757] XFS (vdc): Mounting V5 Filesystem
> > > [  636.000955] XFS (vdc): Ending clean mount
> > > [  636.031583] XFS (vdc): Unmounting Filesystem
> > > [  636.115131] XFS (vdc): Mounting V5 Filesystem
> > > [  636.123271] XFS (vdc): Ending clean mount
> > > [  639.538434] XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
> > > [  639.539893] ------------[ cut here ]------------
> > > [  639.540516] kernel BUG at fs/xfs/xfs_message.c:102!
> > > [  639.541168] invalid opcode: 0000 [#1] SMP PTI
> > > [  639.541814] CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
> > > [  639.542744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
> > > [  639.543848] RIP: 0010:assfail+0x23/0x30
> > > [  639.544470] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> > > [  639.546396] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> > > [  639.546899] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> > > [  639.547582] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> > > [  639.548273] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> > > [  639.548954] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> > > [  639.549704] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> > > [  639.550397] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> > > [  639.551166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  639.551795] CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
> > > [  639.552604] Call Trace:
> > > [  639.552896]  xfs_defer_init+0xff/0x160
> > > [  639.553331]  xfs_reflink_remap_extent+0x31b/0xa00
> > 
> > Hmmm, this is one of those functions that does:
> > 
> > xfs_trans_alloc
> > while(...)
> > 	xfs_defer_init
> > 	...
> > 	xfs_defer_finish
> > xfs_trans_commit
> > 
> > Hmm, we set t_firstblock to NULLFSBLOCK in _trans_alloc and _trans_dup,
> > which should take care of the preconditions and the defer_finis...
> > 
> > ...oh, you know what I think it is?  We don't roll the transaction after
> > successfully finishing the while (xfs_defer_has_unfinished_work) loop in
> > _defer_finish, which means that t_firstblock can be non-null at the
> > bottom of that loop, which means that we then trip the assert at the top
> > of the loop, just like you saw.
> > 
> > An extra downside is that _defer_finish returns with a dirty transaction,
> > which means that whatever happens afterwards can (in theory) exceed the
> > transaction reservation and blow up.  Yikes.
> > 
> 
> My first thought was whether we could get through that loop with an
> allocation or something but without deferring anything, but what you
> describe above seems much more likely. Given the next series of
> refactoring buries the final defer_finish() in trans_commit(), I'm
> wondering if we should just have a defer_finish() wrapper that rolls the
> tx a final time, which we can bypass in the trans_commit() case because
> we aren't making any more changes to the transaction. Hm?

That sounds like a reasonable approach.  In the meantime I'm testing a
patch (below) to fix(?) the problem on 4.9 - current kernels.  If it
tests ok I'll send it for reals to the list.

--D

Christoph Hellwig reported seeing the following assertion in
generic/051:

XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:102!
invalid opcode: 0000 [#1] SMP PTI
CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
RIP: 0010:assfail+0x23/0x30
Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
Call Trace:
 xfs_defer_init+0xff/0x160
 xfs_reflink_remap_extent+0x31b/0xa00
 xfs_reflink_remap_blocks+0xec/0x4a0
 xfs_reflink_remap_range+0x3a1/0x650
 xfs_file_dedupe_range+0x39/0x50
 vfs_dedupe_file_range+0x218/0x260
 do_vfs_ioctl+0x262/0x6a0
 ? __se_sys_newfstat+0x3c/0x60
 ksys_ioctl+0x35/0x60
 __x64_sys_ioctl+0x11/0x20
 do_syscall_64+0x4b/0x190
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The root cause of the assertion failure is that xfs_defer_finish doesn't
roll the transaction after processing all the deferred items.  Therefore
it returns a dirty transaction to the caller, which leaves the caller at
risk of exceeding the transaction reservation if it logs more items.

Brian Foster's patchset to move the defer_ops firstblock into the
transaction requires t_firstblock == NULLFSBLOCK upon defer_ops
initialization, which is how this was noticed at all.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 2713e2d808a7..9bab0fb23761 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -424,6 +424,11 @@ xfs_defer_finish(
 			cleanup_fn(*tp, state, error);
 	}
 
+	/*
+	 * Roll the transaction once more to avoid returning to the caller
+	 * with a dirty transaction.
+	 */
+	error = xfs_defer_trans_roll(tp, dop);
 out:
 	(*tp)->t_dfops = orig_dop;
 	if (error)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: t_firstblock assert triggered from dedup in generic/051
  2018-07-11 19:43     ` Darrick J. Wong
@ 2018-07-12  0:25       ` Darrick J. Wong
  2018-07-12 12:06         ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2018-07-12  0:25 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Wed, Jul 11, 2018 at 12:43:21PM -0700, Darrick J. Wong wrote:
> On Wed, Jul 11, 2018 at 03:38:21PM -0400, Brian Foster wrote:
> > On Wed, Jul 11, 2018 at 12:09:25PM -0700, Darrick J. Wong wrote:
> > > On Wed, Jul 11, 2018 at 08:42:41PM +0200, Christoph Hellwig wrote:
> > > > I ran into this when testing the rebase of my cow optimization
> > > > series to the 4.19-merge branch, but from a quick glance it really
> > > > looks like the t_firstblock series might be at faul.  Feel free
> > > > to discard if you think it is not, I will retest with my patches
> > > > tomorrow.
> > > > 
> > > > [  635.600041] run fstests generic/051 at 2018-07-11 18:25:22
> > > > [  635.802443] XFS (vdb): Mounting V5 Filesystem
> > > > [  635.814103] XFS (vdb): Ending clean mount
> > > > [  635.992757] XFS (vdc): Mounting V5 Filesystem
> > > > [  636.000955] XFS (vdc): Ending clean mount
> > > > [  636.031583] XFS (vdc): Unmounting Filesystem
> > > > [  636.115131] XFS (vdc): Mounting V5 Filesystem
> > > > [  636.123271] XFS (vdc): Ending clean mount
> > > > [  639.538434] XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
> > > > [  639.539893] ------------[ cut here ]------------
> > > > [  639.540516] kernel BUG at fs/xfs/xfs_message.c:102!
> > > > [  639.541168] invalid opcode: 0000 [#1] SMP PTI
> > > > [  639.541814] CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
> > > > [  639.542744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
> > > > [  639.543848] RIP: 0010:assfail+0x23/0x30
> > > > [  639.544470] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> > > > [  639.546396] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> > > > [  639.546899] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> > > > [  639.547582] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> > > > [  639.548273] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> > > > [  639.548954] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> > > > [  639.549704] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> > > > [  639.550397] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> > > > [  639.551166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  639.551795] CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
> > > > [  639.552604] Call Trace:
> > > > [  639.552896]  xfs_defer_init+0xff/0x160
> > > > [  639.553331]  xfs_reflink_remap_extent+0x31b/0xa00
> > > 
> > > Hmmm, this is one of those functions that does:
> > > 
> > > xfs_trans_alloc
> > > while(...)
> > > 	xfs_defer_init
> > > 	...
> > > 	xfs_defer_finish
> > > xfs_trans_commit
> > > 
> > > Hmm, we set t_firstblock to NULLFSBLOCK in _trans_alloc and _trans_dup,
> > > which should take care of the preconditions and the defer_finis...
> > > 
> > > ...oh, you know what I think it is?  We don't roll the transaction after
> > > successfully finishing the while (xfs_defer_has_unfinished_work) loop in
> > > _defer_finish, which means that t_firstblock can be non-null at the
> > > bottom of that loop, which means that we then trip the assert at the top
> > > of the loop, just like you saw.
> > > 
> > > An extra downside is that _defer_finish returns with a dirty transaction,
> > > which means that whatever happens afterwards can (in theory) exceed the
> > > transaction reservation and blow up.  Yikes.
> > > 
> > 
> > My first thought was whether we could get through that loop with an
> > allocation or something but without deferring anything, but what you
> > describe above seems much more likely. Given the next series of
> > refactoring buries the final defer_finish() in trans_commit(), I'm
> > wondering if we should just have a defer_finish() wrapper that rolls the
> > tx a final time, which we can bypass in the trans_commit() case because
> > we aren't making any more changes to the transaction. Hm?
> 
> That sounds like a reasonable approach.  In the meantime I'm testing a
> patch (below) to fix(?) the problem on 4.9 - current kernels.  If it
> tests ok I'll send it for reals to the list.
> 
> Christoph Hellwig reported seeing the following assertion in
> generic/051:
> 
> XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
> ------------[ cut here ]------------
> kernel BUG at fs/xfs/xfs_message.c:102!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
> RIP: 0010:assfail+0x23/0x30
> Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
> Call Trace:
>  xfs_defer_init+0xff/0x160
>  xfs_reflink_remap_extent+0x31b/0xa00
>  xfs_reflink_remap_blocks+0xec/0x4a0
>  xfs_reflink_remap_range+0x3a1/0x650
>  xfs_file_dedupe_range+0x39/0x50
>  vfs_dedupe_file_range+0x218/0x260
>  do_vfs_ioctl+0x262/0x6a0
>  ? __se_sys_newfstat+0x3c/0x60
>  ksys_ioctl+0x35/0x60
>  __x64_sys_ioctl+0x11/0x20
>  do_syscall_64+0x4b/0x190
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> The root cause of the assertion failure is that xfs_defer_finish doesn't
> roll the transaction after processing all the deferred items.  Therefore
> it returns a dirty transaction to the caller, which leaves the caller at
> risk of exceeding the transaction reservation if it logs more items.
> 
> Brian Foster's patchset to move the defer_ops firstblock into the
> transaction requires t_firstblock == NULLFSBLOCK upon defer_ops
> initialization, which is how this was noticed at all.
> 
> Reported-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 2713e2d808a7..9bab0fb23761 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -424,6 +424,11 @@ xfs_defer_finish(
>  			cleanup_fn(*tp, state, error);
>  	}
>  
> +	/*
> +	 * Roll the transaction once more to avoid returning to the caller
> +	 * with a dirty transaction.
> +	 */
> +	error = xfs_defer_trans_roll(tp, dop);

Heh, it doesn't test ok; back to the drawing board...

XFS: Assertion failed: !test_bit(XFS_LI_DIRTY, &lip->li_flags), file: fs/xfs/xfs_trans.c, line: 740

I think we only need to roll if the transaction is actually dirty, i.e.

	if ((*tp)->t_flags & XFS_TRANS_DIRTY)
		error = xfs_defer_trans_roll(tp, dop);

--D

>  out:
>  	(*tp)->t_dfops = orig_dop;
>  	if (error)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: t_firstblock assert triggered from dedup in generic/051
  2018-07-12  0:25       ` Darrick J. Wong
@ 2018-07-12 12:06         ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2018-07-12 12:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, Christoph Hellwig, linux-xfs

On Wed, Jul 11, 2018 at 05:25:34PM -0700, Darrick J. Wong wrote:
> > +	/*
> > +	 * Roll the transaction once more to avoid returning to the caller
> > +	 * with a dirty transaction.
> > +	 */
> > +	error = xfs_defer_trans_roll(tp, dop);
> 
> Heh, it doesn't test ok; back to the drawing board...
> 
> XFS: Assertion failed: !test_bit(XFS_LI_DIRTY, &lip->li_flags), file: fs/xfs/xfs_trans.c, line: 740

I haven't hit this yet with a full 4k+reflink and 1k+reflink run,
so far things seem to do fine.

> I think we only need to roll if the transaction is actually dirty, i.e.
> 
> 	if ((*tp)->t_flags & XFS_TRANS_DIRTY)
> 		error = xfs_defer_trans_roll(tp, dop);

But independent of me not hitting the assert that looks reasonable.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-07-12 12:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-11 18:42 t_firstblock assert triggered from dedup in generic/051 Christoph Hellwig
2018-07-11 19:09 ` Darrick J. Wong
2018-07-11 19:38   ` Brian Foster
2018-07-11 19:43     ` Darrick J. Wong
2018-07-12  0:25       ` Darrick J. Wong
2018-07-12 12:06         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).