* XFS kernel BUG during generic/270 with v4.10 @ 2017-02-22 19:13 Ross Zwisler 2017-02-22 19:39 ` Ross Zwisler 2017-03-02 16:29 ` Brian Foster 0 siblings, 2 replies; 10+ messages in thread From: Ross Zwisler @ 2017-02-22 19:13 UTC (permalink / raw) To: Darrick J. Wong, Brian Foster, linux-xfs By running generic/270 in a loop on an XFS filesystem mounted with DAX I'm able to reliably generate the following kernel bug after a few (~10) iterations (output passed through kasan_symbolize.py): run fstests generic/270 at 2017-02-22 12:01:05 XFS (pmem0p2): Unmounting Filesystem XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk XFS (pmem0p2): Mounting V5 Filesystem XFS (pmem0p2): Ending clean mount XFS (pmem0p2): Quotacheck needed: Please wait. XFS (pmem0p2): Quotacheck: Done. XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 965 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:113! invalid opcode: 0000 [#1] PREEMPT SMP Modules linked in: dax_pmem nd_pmem dax nd_btt nd_e820 libnvdimm CPU: 0 PID: 15817 Comm: 270 Tainted: G W 4.10.0 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 task: ffff88050f988000 task.stack: ffffc9000393c000 RIP: 0010:assfail+0x20/0x30 RSP: 0018:ffffc9000393fb48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800aac34ce0 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff81ec6d80 RBP: ffffc9000393fb48 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000a R11: f000000000000000 R12: ffff8800aac34a40 R13: ffffffff81c55100 R14: ffffffff81f1c2fb R15: 000000000000009e FS: 00007f876cbf8b40(0000) GS:ffff880514800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055dc62780950 CR3: 000000050e587000 CR4: 00000000001406f0 Call Trace: [< none >] xfs_fs_destroy_inode+0x283/0x350 fs/xfs/xfs_super.c:965 [< none >] destroy_inode+0x3b/0x60 fs/inode.c:264 [< none >] evict+0x139/0x1c0 fs/inode.c:570 [< none >] dispose_list+0x56/0x80 fs/inode.c:588 [< none >] prune_icache_sb+0x5a/0x80 fs/inode.c:775 [< none >] super_cache_scan+0x14e/0x1a0 fs/super.c:102 [< inline >] do_shrink_slab mm/vmscan.c:378 [< none >] shrink_slab.part.39+0x216/0x620 mm/vmscan.c:481 [< none >] shrink_slab+0x29/0x30 mm/vmscan.c:441 [< none >] drop_slab_node+0x31/0x60 mm/vmscan.c:499 [< none >] drop_slab+0x3f/0x70 mm/vmscan.c:510 [< none >] drop_caches_sysctl_handler+0x71/0xc0 fs/drop_caches.c:58 [< none >] proc_sys_call_handler+0xea/0x110 fs/proc/proc_sysctl.c:548 [< none >] proc_sys_write+0x14/0x20 fs/proc/proc_sysctl.c:566 [< none >] __vfs_write+0x37/0x160 fs/read_write.c:510 ?[< none >] rcu_sync_lockdep_assert+0x12/0x60 kernel/rcu/sync.c:68 ?[< inline >] percpu_down_read ./include/linux/percpu-rwsem.h:59 ?[< none >] __sb_start_write+0x10d/0x220 fs/super.c:1291 ?[< inline >] file_start_write ./include/linux/fs.h:2547 ?[< none >] vfs_write+0x19b/0x1f0 fs/read_write.c:559 ?[< none >] security_file_permission+0x3b/0xc0 security/security.c:776 [< none >] vfs_write+0xcb/0x1f0 fs/read_write.c:560 [< inline >] SYSC_write fs/read_write.c:607 [< none >] SyS_write+0x58/0xc0 fs/read_write.c:599 [< none >] entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:204 RIP: 0033:0x7f876c2e1c30 RSP: 002b:00007ffe6405c148 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f876c5ab5e0 RCX: 00007f876c2e1c30 RDX: 0000000000000002 RSI: 000055dc62780950 RDI: 0000000000000001 RBP: 0000000000000001 R08: 00007f876c5ac740 R09: 00007f876cbf8b40 R10: 0000000000000073 R11: 0000000000000246 R12: 000055dc62976b90 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 Code: 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f1 41 89 d0 48 c7 c6 b8 09 f1 81 48 89 fa 31 ff 48 89 e5 e8 b0 f8 ff ff <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 RIP: assfail+0x20/0x30 RSP: ffffc9000393fb48 ---[ end trace 384d06985052f068 ]--- Here's the xfstests run: FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 alara 4.10.0 MKFS_OPTIONS -- -f -bsize=4096 /dev/pmem0p2 MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem0p2 /mnt/xfstests_scratch generic/270 24s ..../check: line 596: 15817 Segmentation fault ./$seq > $tmp.rawout 2>&1 [failed, exit status 139] - output mismatch (see /root/xfstests/results//generic/270.out.bad) --- tests/generic/270.out 2016-10-21 15:31:10.568945780 -0600 +++ /root/xfstests/results//generic/270.out.bad 2017-02-22 12:01:29.272718284 -0700 @@ -3,6 +3,3 @@ Run fsstress Run dd writers in parallel -Comparing user usage -Comparing group usage -Comparing filesystem consistency ... (Run 'diff -u tests/generic/270.out /root/xfstests/results//generic/270.out.bad' to see the entire diff) This was done in my normal test setup, which is a pair of PMEM disks that enable DAX. Here are the versions of xfstests and xfsprogs that I'm using: xfstets: f438604 generic: test mmap io through DAX and non-DAX xfsprogs: xfs_admin version 4.9.0 This is just the xfsprogs that comes packaged with Fedora 25. Thanks, - Ross ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS kernel BUG during generic/270 with v4.10 2017-02-22 19:13 XFS kernel BUG during generic/270 with v4.10 Ross Zwisler @ 2017-02-22 19:39 ` Ross Zwisler 2017-03-02 16:29 ` Brian Foster 1 sibling, 0 replies; 10+ messages in thread From: Ross Zwisler @ 2017-02-22 19:39 UTC (permalink / raw) To: Ross Zwisler; +Cc: Darrick J. Wong, Brian Foster, linux-xfs On Wed, Feb 22, 2017 at 12:13:00PM -0700, Ross Zwisler wrote: > By running generic/270 in a loop on an XFS filesystem mounted with DAX I'm > able to reliably generate the following kernel bug after a few (~10) > iterations (output passed through kasan_symbolize.py): A few other notes which I should have included in the original report: 1) I initially found this while testing Brian's fix for the WARN_ON I reported with this same test [1]. 2) After the initial failure I was able to reproduce the issue reliably on v4.10. Based on 1) and 2) my guess is that this failure is independent of the other issue that I reported and that Brian was trying to fix. They just happen to show up with the same xfstest. [1] https://www.spinics.net/lists/linux-xfs/msg04300.html Thanks, - Ross ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS kernel BUG during generic/270 with v4.10 2017-02-22 19:13 XFS kernel BUG during generic/270 with v4.10 Ross Zwisler 2017-02-22 19:39 ` Ross Zwisler @ 2017-03-02 16:29 ` Brian Foster 2017-03-02 16:47 ` Darrick J. Wong 2017-03-06 18:41 ` Ross Zwisler 1 sibling, 2 replies; 10+ messages in thread From: Brian Foster @ 2017-03-02 16:29 UTC (permalink / raw) To: Ross Zwisler; +Cc: Darrick J. Wong, linux-xfs On Wed, Feb 22, 2017 at 12:13:00PM -0700, Ross Zwisler wrote: > By running generic/270 in a loop on an XFS filesystem mounted with DAX I'm > able to reliably generate the following kernel bug after a few (~10) > iterations (output passed through kasan_symbolize.py): > > run fstests generic/270 at 2017-02-22 12:01:05 > XFS (pmem0p2): Unmounting Filesystem > XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk > XFS (pmem0p2): Mounting V5 Filesystem > XFS (pmem0p2): Ending clean mount > XFS (pmem0p2): Quotacheck needed: Please wait. > XFS (pmem0p2): Quotacheck: Done. > XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) > XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 965 This means we've reclaimed an inode that still has delayed allocation blocks, which shouldn't occur. We do have one recent fix in this area: fa7f138 ("xfs: clear delalloc and cache on buffered write failure"). Do you still reproduce this? If so, does it reproduce with that patch? > ------------[ cut here ]------------ ... > ---[ end trace 384d06985052f068 ]--- > > Here's the xfstests run: > > FSTYP -- xfs (debug) > PLATFORM -- Linux/x86_64 alara 4.10.0 > MKFS_OPTIONS -- -f -bsize=4096 /dev/pmem0p2 > MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem0p2 /mnt/xfstests_scratch > > generic/270 24s ..../check: line 596: 15817 Segmentation fault ./$seq > $tmp.rawout 2>&1 > [failed, exit status 139] - output mismatch (see /root/xfstests/results//generic/270.out.bad) > --- tests/generic/270.out 2016-10-21 15:31:10.568945780 -0600 > +++ /root/xfstests/results//generic/270.out.bad 2017-02-22 12:01:29.272718284 -0700 > @@ -3,6 +3,3 @@ > Run fsstress > > Run dd writers in parallel > -Comparing user usage > -Comparing group usage > -Comparing filesystem consistency > ... > (Run 'diff -u tests/generic/270.out /root/xfstests/results//generic/270.out.bad' to see the entire diff) > > This was done in my normal test setup, which is a pair of PMEM disks that > enable DAX. > What I'm a little confused about though is that I thought DAX meant we bypassed buffered I/O and always used direct I/O (which means you should never perform delayed allocation). :/ Brian > Here are the versions of xfstests and xfsprogs that I'm using: > > xfstets: f438604 generic: test mmap io through DAX and non-DAX > > xfsprogs: xfs_admin version 4.9.0 > This is just the xfsprogs that comes packaged with Fedora 25. > > Thanks, > - Ross > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS kernel BUG during generic/270 with v4.10 2017-03-02 16:29 ` Brian Foster @ 2017-03-02 16:47 ` Darrick J. Wong 2017-03-02 17:13 ` Brian Foster 2017-03-02 17:25 ` Eric Sandeen 2017-03-06 18:41 ` Ross Zwisler 1 sibling, 2 replies; 10+ messages in thread From: Darrick J. Wong @ 2017-03-02 16:47 UTC (permalink / raw) To: Brian Foster; +Cc: Ross Zwisler, linux-xfs On Thu, Mar 02, 2017 at 11:29:34AM -0500, Brian Foster wrote: > On Wed, Feb 22, 2017 at 12:13:00PM -0700, Ross Zwisler wrote: > > By running generic/270 in a loop on an XFS filesystem mounted with DAX I'm > > able to reliably generate the following kernel bug after a few (~10) > > iterations (output passed through kasan_symbolize.py): > > > > run fstests generic/270 at 2017-02-22 12:01:05 > > XFS (pmem0p2): Unmounting Filesystem > > XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk > > XFS (pmem0p2): Mounting V5 Filesystem > > XFS (pmem0p2): Ending clean mount > > XFS (pmem0p2): Quotacheck needed: Please wait. > > XFS (pmem0p2): Quotacheck: Done. > > XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) > > XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 965 > > This means we've reclaimed an inode that still has delayed allocation > blocks, which shouldn't occur. We do have one recent fix in this area: > fa7f138 ("xfs: clear delalloc and cache on buffered write failure"). Do > you still reproduce this? If so, does it reproduce with that patch? > > > ------------[ cut here ]------------ > ... > > ---[ end trace 384d06985052f068 ]--- > > > > Here's the xfstests run: > > > > FSTYP -- xfs (debug) > > PLATFORM -- Linux/x86_64 alara 4.10.0 > > MKFS_OPTIONS -- -f -bsize=4096 /dev/pmem0p2 > > MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem0p2 /mnt/xfstests_scratch > > > > generic/270 24s ..../check: line 596: 15817 Segmentation fault ./$seq > $tmp.rawout 2>&1 > > [failed, exit status 139] - output mismatch (see /root/xfstests/results//generic/270.out.bad) > > --- tests/generic/270.out 2016-10-21 15:31:10.568945780 -0600 > > +++ /root/xfstests/results//generic/270.out.bad 2017-02-22 12:01:29.272718284 -0700 > > @@ -3,6 +3,3 @@ > > Run fsstress > > > > Run dd writers in parallel > > -Comparing user usage > > -Comparing group usage > > -Comparing filesystem consistency > > ... > > (Run 'diff -u tests/generic/270.out /root/xfstests/results//generic/270.out.bad' to see the entire diff) > > > > This was done in my normal test setup, which is a pair of PMEM disks that > > enable DAX. > > > > What I'm a little confused about though is that I thought DAX meant we > bypassed buffered I/O and always used direct I/O (which means you should > never perform delayed allocation). :/ The block devices are pmem, but I don't think g/270 does anything special to turn on DAX (the inode flag) for the files it's writing. --D > > Brian > > > Here are the versions of xfstests and xfsprogs that I'm using: > > > > xfstets: f438604 generic: test mmap io through DAX and non-DAX > > > > xfsprogs: xfs_admin version 4.9.0 > > This is just the xfsprogs that comes packaged with Fedora 25. > > > > Thanks, > > - Ross > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS kernel BUG during generic/270 with v4.10 2017-03-02 16:47 ` Darrick J. Wong @ 2017-03-02 17:13 ` Brian Foster 2017-03-02 17:28 ` Darrick J. Wong 2017-03-02 17:25 ` Eric Sandeen 1 sibling, 1 reply; 10+ messages in thread From: Brian Foster @ 2017-03-02 17:13 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Ross Zwisler, linux-xfs On Thu, Mar 02, 2017 at 08:47:01AM -0800, Darrick J. Wong wrote: > On Thu, Mar 02, 2017 at 11:29:34AM -0500, Brian Foster wrote: > > On Wed, Feb 22, 2017 at 12:13:00PM -0700, Ross Zwisler wrote: > > > By running generic/270 in a loop on an XFS filesystem mounted with DAX I'm > > > able to reliably generate the following kernel bug after a few (~10) > > > iterations (output passed through kasan_symbolize.py): > > > > > > run fstests generic/270 at 2017-02-22 12:01:05 > > > XFS (pmem0p2): Unmounting Filesystem > > > XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk > > > XFS (pmem0p2): Mounting V5 Filesystem > > > XFS (pmem0p2): Ending clean mount > > > XFS (pmem0p2): Quotacheck needed: Please wait. > > > XFS (pmem0p2): Quotacheck: Done. > > > XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) > > > XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 965 > > > > This means we've reclaimed an inode that still has delayed allocation > > blocks, which shouldn't occur. We do have one recent fix in this area: > > fa7f138 ("xfs: clear delalloc and cache on buffered write failure"). Do > > you still reproduce this? If so, does it reproduce with that patch? > > > > > ------------[ cut here ]------------ > > ... > > > ---[ end trace 384d06985052f068 ]--- > > > > > > Here's the xfstests run: > > > > > > FSTYP -- xfs (debug) > > > PLATFORM -- Linux/x86_64 alara 4.10.0 > > > MKFS_OPTIONS -- -f -bsize=4096 /dev/pmem0p2 > > > MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem0p2 /mnt/xfstests_scratch > > > > > > generic/270 24s ..../check: line 596: 15817 Segmentation fault ./$seq > $tmp.rawout 2>&1 > > > [failed, exit status 139] - output mismatch (see /root/xfstests/results//generic/270.out.bad) > > > --- tests/generic/270.out 2016-10-21 15:31:10.568945780 -0600 > > > +++ /root/xfstests/results//generic/270.out.bad 2017-02-22 12:01:29.272718284 -0700 > > > @@ -3,6 +3,3 @@ > > > Run fsstress > > > > > > Run dd writers in parallel > > > -Comparing user usage > > > -Comparing group usage > > > -Comparing filesystem consistency > > > ... > > > (Run 'diff -u tests/generic/270.out /root/xfstests/results//generic/270.out.bad' to see the entire diff) > > > > > > This was done in my normal test setup, which is a pair of PMEM disks that > > > enable DAX. > > > > > > > What I'm a little confused about though is that I thought DAX meant we > > bypassed buffered I/O and always used direct I/O (which means you should > > never perform delayed allocation). :/ > > The block devices are pmem, but I don't think g/270 does anything > special to turn on DAX (the inode flag) for the files it's writing. > Doesn't the dax mount force the inode flag? Brian > --D > > > > > Brian > > > > > Here are the versions of xfstests and xfsprogs that I'm using: > > > > > > xfstets: f438604 generic: test mmap io through DAX and non-DAX > > > > > > xfsprogs: xfs_admin version 4.9.0 > > > This is just the xfsprogs that comes packaged with Fedora 25. > > > > > > Thanks, > > > - Ross > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS kernel BUG during generic/270 with v4.10 2017-03-02 17:13 ` Brian Foster @ 2017-03-02 17:28 ` Darrick J. Wong 0 siblings, 0 replies; 10+ messages in thread From: Darrick J. Wong @ 2017-03-02 17:28 UTC (permalink / raw) To: Brian Foster; +Cc: Ross Zwisler, linux-xfs On Thu, Mar 02, 2017 at 12:13:29PM -0500, Brian Foster wrote: > On Thu, Mar 02, 2017 at 08:47:01AM -0800, Darrick J. Wong wrote: > > On Thu, Mar 02, 2017 at 11:29:34AM -0500, Brian Foster wrote: > > > On Wed, Feb 22, 2017 at 12:13:00PM -0700, Ross Zwisler wrote: > > > > By running generic/270 in a loop on an XFS filesystem mounted with DAX I'm > > > > able to reliably generate the following kernel bug after a few (~10) > > > > iterations (output passed through kasan_symbolize.py): > > > > > > > > run fstests generic/270 at 2017-02-22 12:01:05 > > > > XFS (pmem0p2): Unmounting Filesystem > > > > XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk > > > > XFS (pmem0p2): Mounting V5 Filesystem > > > > XFS (pmem0p2): Ending clean mount > > > > XFS (pmem0p2): Quotacheck needed: Please wait. > > > > XFS (pmem0p2): Quotacheck: Done. > > > > XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) > > > > XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 965 > > > > > > This means we've reclaimed an inode that still has delayed allocation > > > blocks, which shouldn't occur. We do have one recent fix in this area: > > > fa7f138 ("xfs: clear delalloc and cache on buffered write failure"). Do > > > you still reproduce this? If so, does it reproduce with that patch? > > > > > > > ------------[ cut here ]------------ > > > ... > > > > ---[ end trace 384d06985052f068 ]--- > > > > > > > > Here's the xfstests run: > > > > > > > > FSTYP -- xfs (debug) > > > > PLATFORM -- Linux/x86_64 alara 4.10.0 > > > > MKFS_OPTIONS -- -f -bsize=4096 /dev/pmem0p2 > > > > MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem0p2 /mnt/xfstests_scratch > > > > > > > > generic/270 24s ..../check: line 596: 15817 Segmentation fault ./$seq > $tmp.rawout 2>&1 > > > > [failed, exit status 139] - output mismatch (see /root/xfstests/results//generic/270.out.bad) > > > > --- tests/generic/270.out 2016-10-21 15:31:10.568945780 -0600 > > > > +++ /root/xfstests/results//generic/270.out.bad 2017-02-22 12:01:29.272718284 -0700 > > > > @@ -3,6 +3,3 @@ > > > > Run fsstress > > > > > > > > Run dd writers in parallel > > > > -Comparing user usage > > > > -Comparing group usage > > > > -Comparing filesystem consistency > > > > ... > > > > (Run 'diff -u tests/generic/270.out /root/xfstests/results//generic/270.out.bad' to see the entire diff) > > > > > > > > This was done in my normal test setup, which is a pair of PMEM disks that > > > > enable DAX. > > > > > > > > > > What I'm a little confused about though is that I thought DAX meant we > > > bypassed buffered I/O and always used direct I/O (which means you should > > > never perform delayed allocation). :/ > > > > The block devices are pmem, but I don't think g/270 does anything > > special to turn on DAX (the inode flag) for the files it's writing. > > > > Doesn't the dax mount force the inode flag? Oh, heh, I forgot that the mount flag still exists. You're right. --D > > Brian > > > --D > > > > > > > > Brian > > > > > > > Here are the versions of xfstests and xfsprogs that I'm using: > > > > > > > > xfstets: f438604 generic: test mmap io through DAX and non-DAX > > > > > > > > xfsprogs: xfs_admin version 4.9.0 > > > > This is just the xfsprogs that comes packaged with Fedora 25. > > > > > > > > Thanks, > > > > - Ross > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS kernel BUG during generic/270 with v4.10 2017-03-02 16:47 ` Darrick J. Wong 2017-03-02 17:13 ` Brian Foster @ 2017-03-02 17:25 ` Eric Sandeen 1 sibling, 0 replies; 10+ messages in thread From: Eric Sandeen @ 2017-03-02 17:25 UTC (permalink / raw) To: Darrick J. Wong, Brian Foster; +Cc: Ross Zwisler, linux-xfs On 3/2/17 10:47 AM, Darrick J. Wong wrote: > On Thu, Mar 02, 2017 at 11:29:34AM -0500, Brian Foster wrote: >> On Wed, Feb 22, 2017 at 12:13:00PM -0700, Ross Zwisler wrote: >>> By running generic/270 in a loop on an XFS filesystem mounted with DAX I'm >>> able to reliably generate the following kernel bug after a few (~10) >>> iterations (output passed through kasan_symbolize.py): >>> >>> run fstests generic/270 at 2017-02-22 12:01:05 >>> XFS (pmem0p2): Unmounting Filesystem >>> XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk >>> XFS (pmem0p2): Mounting V5 Filesystem >>> XFS (pmem0p2): Ending clean mount >>> XFS (pmem0p2): Quotacheck needed: Please wait. >>> XFS (pmem0p2): Quotacheck: Done. >>> XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) >>> XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 965 ... >>> This was done in my normal test setup, which is a pair of PMEM disks that >>> enable DAX. >>> >> >> What I'm a little confused about though is that I thought DAX meant we >> bypassed buffered I/O and always used direct I/O (which means you should >> never perform delayed allocation). :/ > > The block devices are pmem, but I don't think g/270 does anything > special to turn on DAX (the inode flag) for the files it's writing. Ross's dmesg says: DAX enabled. Warning: EXPERIMENTAL, use at your own risk which means he has "-o dax" in his mount options for xfstests. -Eric ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS kernel BUG during generic/270 with v4.10 2017-03-02 16:29 ` Brian Foster 2017-03-02 16:47 ` Darrick J. Wong @ 2017-03-06 18:41 ` Ross Zwisler 2017-03-06 18:48 ` Darrick J. Wong 2017-03-07 14:32 ` Brian Foster 1 sibling, 2 replies; 10+ messages in thread From: Ross Zwisler @ 2017-03-06 18:41 UTC (permalink / raw) To: Brian Foster; +Cc: Ross Zwisler, Darrick J. Wong, linux-xfs On Thu, Mar 02, 2017 at 11:29:34AM -0500, Brian Foster wrote: > On Wed, Feb 22, 2017 at 12:13:00PM -0700, Ross Zwisler wrote: > > By running generic/270 in a loop on an XFS filesystem mounted with DAX I'm > > able to reliably generate the following kernel bug after a few (~10) > > iterations (output passed through kasan_symbolize.py): > > > > run fstests generic/270 at 2017-02-22 12:01:05 > > XFS (pmem0p2): Unmounting Filesystem > > XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk > > XFS (pmem0p2): Mounting V5 Filesystem > > XFS (pmem0p2): Ending clean mount > > XFS (pmem0p2): Quotacheck needed: Please wait. > > XFS (pmem0p2): Quotacheck: Done. > > XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) > > XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 965 > > This means we've reclaimed an inode that still has delayed allocation > blocks, which shouldn't occur. We do have one recent fix in this area: > fa7f138 ("xfs: clear delalloc and cache on buffered write failure"). Do > you still reproduce this? If so, does it reproduce with that patch? Cool, I've done a bunch more testing and have some interesting info. First, this issue isn't specific to DAX. If I turn DAX off, it actually reproduces much faster, usually on the first test run. The branch I could find in the xfs repo that contained commit fa7f138 ("xfs: clear delalloc and cache on buffered write failure") Was based on v4.10-rc6. Interestingly, this baseline does not reproduce this issue, whereas v4.10 release reproduces it very consistently. The commit between v4.10-rc6 and v4.10 that changes this behavior is: d1908f52557b ("fs: break out of iomap_file_buffered_write on fatal signals") As of this commit the problem reproduces very easily, but with the previous commit I can't get it to happen at all. So, once I figured out that I needed d1908f52557b to make the issue appear, I tested v4.10 merged with different commits in the current xfs/for-next branch to try and see if the commit you referenced above fixed the problem, and it does appear to. So, quick summary: v4.10 = failure v4.10 + xfs/for_next = success v4.10 + fa7f138 = success v4.10 + fa7f138~1 (4560e78) = failure So, as far as I can tell, fa7f138 does indeed seem to fix the issue. I don't know if this issue was actually introduced by d1908f52557b, or if that commit just changed things enough that the issue started happening much more regularly? > > ------------[ cut here ]------------ > ... > > ---[ end trace 384d06985052f068 ]--- > > > > Here's the xfstests run: > > > > FSTYP -- xfs (debug) > > PLATFORM -- Linux/x86_64 alara 4.10.0 > > MKFS_OPTIONS -- -f -bsize=4096 /dev/pmem0p2 > > MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem0p2 /mnt/xfstests_scratch > > > > generic/270 24s ..../check: line 596: 15817 Segmentation fault ./$seq > $tmp.rawout 2>&1 > > [failed, exit status 139] - output mismatch (see /root/xfstests/results//generic/270.out.bad) > > --- tests/generic/270.out 2016-10-21 15:31:10.568945780 -0600 > > +++ /root/xfstests/results//generic/270.out.bad 2017-02-22 12:01:29.272718284 -0700 > > @@ -3,6 +3,3 @@ > > Run fsstress > > > > Run dd writers in parallel > > -Comparing user usage > > -Comparing group usage > > -Comparing filesystem consistency > > ... > > (Run 'diff -u tests/generic/270.out /root/xfstests/results//generic/270.out.bad' to see the entire diff) > > > > This was done in my normal test setup, which is a pair of PMEM disks that > > enable DAX. > > > > What I'm a little confused about though is that I thought DAX meant we > bypassed buffered I/O and always used direct I/O (which means you should > never perform delayed allocation). :/ Sorry, I don't know about this one. :/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS kernel BUG during generic/270 with v4.10 2017-03-06 18:41 ` Ross Zwisler @ 2017-03-06 18:48 ` Darrick J. Wong 2017-03-07 14:32 ` Brian Foster 1 sibling, 0 replies; 10+ messages in thread From: Darrick J. Wong @ 2017-03-06 18:48 UTC (permalink / raw) To: Ross Zwisler; +Cc: Brian Foster, linux-xfs On Mon, Mar 06, 2017 at 11:41:09AM -0700, Ross Zwisler wrote: > On Thu, Mar 02, 2017 at 11:29:34AM -0500, Brian Foster wrote: > > On Wed, Feb 22, 2017 at 12:13:00PM -0700, Ross Zwisler wrote: > > > By running generic/270 in a loop on an XFS filesystem mounted with DAX I'm > > > able to reliably generate the following kernel bug after a few (~10) > > > iterations (output passed through kasan_symbolize.py): > > > > > > run fstests generic/270 at 2017-02-22 12:01:05 > > > XFS (pmem0p2): Unmounting Filesystem > > > XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk > > > XFS (pmem0p2): Mounting V5 Filesystem > > > XFS (pmem0p2): Ending clean mount > > > XFS (pmem0p2): Quotacheck needed: Please wait. > > > XFS (pmem0p2): Quotacheck: Done. > > > XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) > > > XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 965 > > > > This means we've reclaimed an inode that still has delayed allocation > > blocks, which shouldn't occur. We do have one recent fix in this area: > > fa7f138 ("xfs: clear delalloc and cache on buffered write failure"). Do > > you still reproduce this? If so, does it reproduce with that patch? > > Cool, I've done a bunch more testing and have some interesting info. > > First, this issue isn't specific to DAX. If I turn DAX off, it actually > reproduces much faster, usually on the first test run. > > The branch I could find in the xfs repo that contained commit > > fa7f138 ("xfs: clear delalloc and cache on buffered write failure") > > Was based on v4.10-rc6. Interestingly, this baseline does not reproduce this > issue, whereas v4.10 release reproduces it very consistently. The commit > between v4.10-rc6 and v4.10 that changes this behavior is: That was merged in 4.11-rc1. > d1908f52557b ("fs: break out of iomap_file_buffered_write on fatal signals") > > As of this commit the problem reproduces very easily, but with the previous > commit I can't get it to happen at all. > > So, once I figured out that I needed d1908f52557b to make the issue appear, I > tested v4.10 merged with different commits in the current xfs/for-next branch > to try and see if the commit you referenced above fixed the problem, and it > does appear to. > > So, quick summary: > > v4.10 = failure > v4.10 + xfs/for_next = success > v4.10 + fa7f138 = success > v4.10 + fa7f138~1 (4560e78) = failure > > So, as far as I can tell, fa7f138 does indeed seem to fix the issue. > > I don't know if this issue was actually introduced by d1908f52557b, or if that > commit just changed things enough that the issue started happening much more > regularly? /me doesn't know either. I don't see anything in g/270 that would send fatal signals.... > > > > ------------[ cut here ]------------ > > ... > > > ---[ end trace 384d06985052f068 ]--- > > > > > > Here's the xfstests run: > > > > > > FSTYP -- xfs (debug) > > > PLATFORM -- Linux/x86_64 alara 4.10.0 > > > MKFS_OPTIONS -- -f -bsize=4096 /dev/pmem0p2 > > > MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem0p2 /mnt/xfstests_scratch > > > > > > generic/270 24s ..../check: line 596: 15817 Segmentation fault ./$seq > $tmp.rawout 2>&1 > > > [failed, exit status 139] - output mismatch (see /root/xfstests/results//generic/270.out.bad) > > > --- tests/generic/270.out 2016-10-21 15:31:10.568945780 -0600 > > > +++ /root/xfstests/results//generic/270.out.bad 2017-02-22 12:01:29.272718284 -0700 > > > @@ -3,6 +3,3 @@ > > > Run fsstress > > > > > > Run dd writers in parallel > > > -Comparing user usage > > > -Comparing group usage > > > -Comparing filesystem consistency > > > ... > > > (Run 'diff -u tests/generic/270.out /root/xfstests/results//generic/270.out.bad' to see the entire diff) > > > > > > This was done in my normal test setup, which is a pair of PMEM disks that > > > enable DAX. > > > > > > > What I'm a little confused about though is that I thought DAX meant we > > bypassed buffered I/O and always used direct I/O (which means you should > > never perform delayed allocation). :/ > > Sorry, I don't know about this one. :/ I was also under the impression that DAX means no delalloc, but maybe there's a defect somewhere? --D > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS kernel BUG during generic/270 with v4.10 2017-03-06 18:41 ` Ross Zwisler 2017-03-06 18:48 ` Darrick J. Wong @ 2017-03-07 14:32 ` Brian Foster 1 sibling, 0 replies; 10+ messages in thread From: Brian Foster @ 2017-03-07 14:32 UTC (permalink / raw) To: Ross Zwisler; +Cc: Darrick J. Wong, linux-xfs On Mon, Mar 06, 2017 at 11:41:09AM -0700, Ross Zwisler wrote: > On Thu, Mar 02, 2017 at 11:29:34AM -0500, Brian Foster wrote: > > On Wed, Feb 22, 2017 at 12:13:00PM -0700, Ross Zwisler wrote: > > > By running generic/270 in a loop on an XFS filesystem mounted with DAX I'm > > > able to reliably generate the following kernel bug after a few (~10) > > > iterations (output passed through kasan_symbolize.py): > > > > > > run fstests generic/270 at 2017-02-22 12:01:05 > > > XFS (pmem0p2): Unmounting Filesystem > > > XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk > > > XFS (pmem0p2): Mounting V5 Filesystem > > > XFS (pmem0p2): Ending clean mount > > > XFS (pmem0p2): Quotacheck needed: Please wait. > > > XFS (pmem0p2): Quotacheck: Done. > > > XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) > > > XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 965 > > > > This means we've reclaimed an inode that still has delayed allocation > > blocks, which shouldn't occur. We do have one recent fix in this area: > > fa7f138 ("xfs: clear delalloc and cache on buffered write failure"). Do > > you still reproduce this? If so, does it reproduce with that patch? > > Cool, I've done a bunch more testing and have some interesting info. > > First, this issue isn't specific to DAX. If I turn DAX off, it actually > reproduces much faster, usually on the first test run. > > The branch I could find in the xfs repo that contained commit > > fa7f138 ("xfs: clear delalloc and cache on buffered write failure") > > Was based on v4.10-rc6. Interestingly, this baseline does not reproduce this > issue, whereas v4.10 release reproduces it very consistently. The commit > between v4.10-rc6 and v4.10 that changes this behavior is: > > d1908f52557b ("fs: break out of iomap_file_buffered_write on fatal signals") > > As of this commit the problem reproduces very easily, but with the previous > commit I can't get it to happen at all. > > So, once I figured out that I needed d1908f52557b to make the issue appear, I > tested v4.10 merged with different commits in the current xfs/for-next branch > to try and see if the commit you referenced above fixed the problem, and it > does appear to. > > So, quick summary: > > v4.10 = failure > v4.10 + xfs/for_next = success > v4.10 + fa7f138 = success > v4.10 + fa7f138~1 (4560e78) = failure > > So, as far as I can tell, fa7f138 does indeed seem to fix the issue. > Ok, great. Thanks for working that all out. > I don't know if this issue was actually introduced by d1908f52557b, or if that > commit just changed things enough that the issue started happening much more > regularly? > I think the latter.. I originally hit the problem fixed by fa7f138 when adding a minor error injection hack to facilitate an xfstests test (related to your previous report, iirc). That hack resulted in the possibility of 'written == 0' cases for the XFS ->iomap_end() handler which were otherwise unlikely due to the limited error possibilities between the time ->iomap_begin() returns successfully and ->iomap_end() is invoked. Commit d1908f52557b adds another error check right in that window (iomap_write_actor()->iomap_write_begin()->fatal_signal_pending()), which apparently turned this into a much more likely case to hit in practice (wrt to generic/270, I'm guessing when the fsstress processes are killed). Note that we do still have one more issue with this code with a fix in-progress[1]. > > > ------------[ cut here ]------------ > > ... > > > ---[ end trace 384d06985052f068 ]--- > > > > > > Here's the xfstests run: > > > > > > FSTYP -- xfs (debug) > > > PLATFORM -- Linux/x86_64 alara 4.10.0 > > > MKFS_OPTIONS -- -f -bsize=4096 /dev/pmem0p2 > > > MOUNT_OPTIONS -- -o dax -o context=system_u:object_r:nfs_t:s0 /dev/pmem0p2 /mnt/xfstests_scratch > > > > > > generic/270 24s ..../check: line 596: 15817 Segmentation fault ./$seq > $tmp.rawout 2>&1 > > > [failed, exit status 139] - output mismatch (see /root/xfstests/results//generic/270.out.bad) > > > --- tests/generic/270.out 2016-10-21 15:31:10.568945780 -0600 > > > +++ /root/xfstests/results//generic/270.out.bad 2017-02-22 12:01:29.272718284 -0700 > > > @@ -3,6 +3,3 @@ > > > Run fsstress > > > > > > Run dd writers in parallel > > > -Comparing user usage > > > -Comparing group usage > > > -Comparing filesystem consistency > > > ... > > > (Run 'diff -u tests/generic/270.out /root/xfstests/results//generic/270.out.bad' to see the entire diff) > > > > > > This was done in my normal test setup, which is a pair of PMEM disks that > > > enable DAX. > > > > > > > What I'm a little confused about though is that I thought DAX meant we > > bypassed buffered I/O and always used direct I/O (which means you should > > never perform delayed allocation). :/ > > Sorry, I don't know about this one. :/ This is kind of a separate question, more pertaining why this issue is even possible with DAX as opposed to purely buffered writes (where it is expected prior to the fix in commit fa7f138). With the info above, I ran generic/270 against an xfs for-next kernel with d1908f52557b applied and fa7f138 reverted and reproduced the issue after several iterations. Digging further, I see that buffered writes do indeed occur and what looks like is going on here is that our hacky XFS_MOUNT_DAX logic in xfs_diflags_to_iflags() doesn't survive an XFS_IOC_FSSETXATTR on the inode. That ioctl is called by fsstress to set/change things like extent size allocation hints, sync/append mode, project quota id, etc. It looks like the intent of the XFS_MOUNT_DAX thing is to set S_DAX for every inode in memory, regardless of the on disk flags. It doesn't actually change the on-disk flags. If a XFS_IOC_FSGETXATTR/FSSETXATTR cycle occurs that makes some unrelated change on the inode, we effectively revalidate the in-core inode flags based on what is on-disk (i.e., we clear S_DAX). From that point forward (unless the inode is reclaimed and re-read from disk I suppose), the DAX state is lost on the inode. My understanding is the '-o dax' mount option is a debug tool and slated for removal. Given that, I'm not sure how critical a problem this really is. If the mount option is going to stay around, I think something like the appended hunk can probably fix it up. Otherwise, it might be a good idea to disable the setxattr fsstress op and otherwise avoid any such inode attribute changes when using it, or to set the DAX flag on disk explicitly. Brian [1] http://www.spinics.net/lists/linux-xfs/msg04685.html ---8<--- diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index edfa6a5..e6a4c38 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -686,8 +686,12 @@ xfs_ip2xflags( struct xfs_inode *ip) { struct xfs_icdinode *dic = &ip->i_d; + uint flags; - return _xfs_dic2xflags(dic->di_flags, dic->di_flags2, XFS_IFORK_Q(ip)); + flags = _xfs_dic2xflags(dic->di_flags, dic->di_flags2, XFS_IFORK_Q(ip)); + if ((ip->i_mount->m_flags & XFS_MOUNT_DAX) && IS_DAX(VFS_I(ip))) + flags |= FS_XFLAG_DAX; + return flags; } /* ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-03-07 14:33 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-02-22 19:13 XFS kernel BUG during generic/270 with v4.10 Ross Zwisler 2017-02-22 19:39 ` Ross Zwisler 2017-03-02 16:29 ` Brian Foster 2017-03-02 16:47 ` Darrick J. Wong 2017-03-02 17:13 ` Brian Foster 2017-03-02 17:28 ` Darrick J. Wong 2017-03-02 17:25 ` Eric Sandeen 2017-03-06 18:41 ` Ross Zwisler 2017-03-06 18:48 ` Darrick J. Wong 2017-03-07 14:32 ` Brian Foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).