public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [6.15-rc2 regression] xfs: null pointer in the dax fault code
@ 2025-04-16 17:43 Darrick J. Wong
  2025-04-16 20:22 ` Darrick J. Wong
  2025-04-16 20:50 ` Dave Chinner
  0 siblings, 2 replies; 4+ messages in thread
From: Darrick J. Wong @ 2025-04-16 17:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

Hi folks,

After upgrading to 6.15-rc2, I see the following crash in (I think?) the
DAX code on xfs/593 (which is a fairly boring fsck test).

MKFS_OPTIONS=" -m metadir=1,autofsck=1,uquota,gquota,pquota, -d daxinherit=1,"
MOUNT_OPTIONS=""

Any ideas?  Does this stack trace ring a bell for anyone?

--D

BUG: kernel NULL pointer dereference, address: 00000000000008a8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: Oops: 0000 [#1] SMP
CPU: 2 UID: 0 PID: 1717921 Comm: fsstress Tainted: G        W           6.15.0-rc2-xfsx #rc2 PREEMPT(lazy)
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__lruvec_stat_mod_folio+0x50/0xd0
Code: e8 85 70 da ff 48 8b 53 38 48 89 d0 48 83 e0 f8 83 e2 02 74 04 48 8b 40 10 48 63 d5 48 85 c0 74 50 6
0 75 54 44
RSP: 0000:ffffc9000679fa20 EFLAGS: 00010206
RAX: 0000000000000200 RBX: ffffea000e298040 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000012 RDI: ffffea000e298040
RBP: 0000000000000001 R08: 8000000000000025 R09: 0000000000000001
R10: 0000000000001000 R11: ffffc9000679fc10 R12: 0000000000000012
R13: ffff88807ffd9d80 R14: ffff888040a79a80 R15: ffffea000e298040
FS:  00007f2dce659740(0000) GS:ffff8880fb85e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000008a8 CR3: 000000005f22a003 CR4: 00000000001706f0
Call Trace:
 <TASK>
 folio_add_file_rmap_ptes+0x109/0x200
 insert_page_into_pte_locked+0x1b6/0x340
 insert_page+0x93/0xc0
 vmf_insert_page_mkwrite+0x2d/0x50
 dax_fault_iter+0x330/0x730
 dax_iomap_pte_fault+0x1a9/0x3e0
 __xfs_write_fault+0x11d/0x290 [xfs 05d1f477986dfc3e3925c4fd18979e6f6f9a9e35]
 __do_fault+0x2d/0x170
 do_fault+0xc8/0x680
 __handle_mm_fault+0x5ba/0x1030
 handle_mm_fault+0x18f/0x280
 do_user_addr_fault+0x481/0x7e0
 exc_page_fault+0x62/0x130
 asm_exc_page_fault+0x22/0x30
RIP: 0033:0x7f2dce7af44a
Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0
0 00 66 90
RSP: 002b:00007fffe75b3198 EFLAGS: 00010202
RAX: 0000000000000054 RBX: 000000000008c000 RCX: 0000000000000dfd
RDX: 00007f2dce63e000 RSI: 0000000000000054 RDI: 00007f2dce658000
RBP: 000000000001adfd R08: 0000000000000005 R09: 000000000008c000
R10: 0000000000000008 R11: 0000000000000246 R12: 00007fffe75b31e0
R13: 8f5c28f5c28f5c29 R14: 00007fffe75b37a0 R15: 000056030c3be570
 </TASK>
Modules linked in: ext4 crc16 mbcache jbd2 xfs nft_chain_nat xt_REDIRECT nf_nat nf_conntrack nf_defrag_ipv
set_hash_ip ip_set_hash_net xt_set nft_compat ip_set_hash_mac ip_set nf_tables nfnetlink sha512_ssse3 ahci
d_btt sch_fq_codel loop fuse configfs ip_tables x_tables overlay nfsv4 af_packet
Dumping ftrace buffer:
   (ftrace buffer empty)
CR2: 00000000000008a8
---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [6.15-rc2 regression] xfs: null pointer in the dax fault code
  2025-04-16 17:43 [6.15-rc2 regression] xfs: null pointer in the dax fault code Darrick J. Wong
@ 2025-04-16 20:22 ` Darrick J. Wong
  2025-04-16 20:50 ` Dave Chinner
  1 sibling, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2025-04-16 20:22 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-xfs

On Wed, Apr 16, 2025 at 10:43:58AM -0700, Darrick J. Wong wrote:
> Hi folks,
> 
> After upgrading to 6.15-rc2, I see the following crash in (I think?) the
> DAX code on xfs/593 (which is a fairly boring fsck test).
> 
> MKFS_OPTIONS=" -m metadir=1,autofsck=1,uquota,gquota,pquota, -d daxinherit=1,"
> MOUNT_OPTIONS=""
> 
> Any ideas?  Does this stack trace ring a bell for anyone?
> 
> --D
On Wed, Apr 16, 2025 at 07:38:36PM +0100, Matthew Wilcox wrote:
> On Wed, Apr 16, 2025 at 11:08:37AM -0700, Darrick J. Wong wrote:
> > Hi folks,
> > 
> > I upgraded my arm64 kernel to 6.15-rc2, and I also see this splat in
> > generic/363.  The fstets config is as follows:
> > 
> > MKFS_OPTIONS="-m metadir=1,autofsck=1,uquota,gquota,pquota, -b size=65536,"
> > MOUNT_OPTIONS=""
> > 
> > The VM is arm64 with 64k base pages.  I've disabled LBS to work around
> > a fair number of other strange bugs.  Does this ring a bell for anyone?
> > 
> > --D
> > 
> > list_add double add: new=ffffffff40538c88, prev=fffffc03febf8148, next=ffffffff40538c88.
> 
> Not a bell, but it's weird.  We're trying to add ffffffff40538c88 to
> the list, but next already has that value.  So this is a double-free of
> the folio?  Do you have VM_BUG_ON_FOLIO enabled with CONFIG_VM_DEBUG?

Note that xfs/593 on x86_64 (same config but no pmem) seems to have
stalled with:

run fstests xfs/593 at 2025-04-16 03:33:38
XFS (sda3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
XFS (sda3): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
XFS (sda3): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
XFS (sda3): Mounting V5 Filesystem c261642e-620a-4382-88f7-648a25d82213
XFS (sda3): Ending clean mount
XFS (sda3): EXPERIMENTAL online scrub feature enabled.  Use at your own risk!
XFS (sda4): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
XFS (sda4): EXPERIMENTAL exchange range feature enabled.  Use at your own risk!
XFS (sda4): EXPERIMENTAL parent pointer feature enabled.  Use at your own risk!
XFS (sda4): Mounting V5 Filesystem 8db5080b-35d5-4308-90bf-cab53746ab63
XFS (sda4): Ending clean mount
XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Quotacheck: Done.
XFS (sda4): EXPERIMENTAL online scrub feature enabled.  Use at your own risk!
INFO: task u16:4:277318 blocked for more than 61 seconds.
      Not tainted 6.15.0-rc2-xfsx #rc2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:u16:4           state:D stack:10728 pid:277318 tgid:277318 ppid:2      task_flags:0x4208060 flags:0x00004000
Workqueue: writeback wb_workfn (flush-8:0)
Call Trace:
 <TASK>
 __schedule+0x458/0x14c0
 ? folio_wait_bit_common+0x118/0x330
 schedule+0x2a/0xe0
 io_schedule+0x4c/0x70
 folio_wait_bit_common+0x144/0x330
 ? filemap_get_read_batch+0x330/0x330
 writeback_iter+0x305/0x340
 iomap_writepages+0x6f/0xb60
 xfs_vm_writepages+0x7c/0xf0 [xfs 05d1f477986dfc3e3925c4fd18979e6f6f9a9e35]
 do_writepages+0x82/0x280
 ? sched_balance_find_src_group+0x4d/0x500
 __writeback_single_inode+0x3d/0x330
 ? do_raw_spin_unlock+0x49/0xb0
 writeback_sb_inodes+0x21c/0x4e0
 wb_writeback+0x99/0x320
 wb_workfn+0xc0/0x410
 process_one_work+0x195/0x3d0
 worker_thread+0x264/0x380
 ? _raw_spin_unlock_irqrestore+0x1e/0x40
 ? rescuer_thread+0x4f0/0x4f0
 kthread+0x117/0x270
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x2d/0x50
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork_asm+0x11/0x20
 </TASK>
INFO: task fsstress:503330 blocked for more than 61 seconds.
      Not tainted 6.15.0-rc2-xfsx #rc2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:fsstress        state:D stack:10872 pid:503330 tgid:503330 ppid:503327 task_flags:0x400140 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x458/0x14c0
 ? wb_queue_work+0x8e/0x100
 schedule+0x2a/0xe0
 wb_wait_for_completion+0x8d/0xc0
 ? cpuacct_css_alloc+0xa0/0xa0
 __writeback_inodes_sb_nr+0x9f/0xc0
 sync_filesystem+0x29/0x90
 __x64_sys_syncfs+0x40/0xa0
 do_syscall_64+0x37/0xf0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f0a3b837d17
RSP: 002b:00007ffd2eea1988 EFLAGS: 00000206 ORIG_RAX: 0000000000000132
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f0a3b837d17
RDX: 0000000000000000 RSI: 00005603860ff430 RDI: 0000000000000005
RBP: 000000000000eb44 R08: 000000000000005b R09: 00007f0a3b92f000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000001
R13: 8f5c28f5c28f5c29 R14: 00007ffd2eea19d0 R15: 0000560348867790
 </TASK>
INFO: task fsstress:503331 blocked for more than 61 seconds.
      Not tainted 6.15.0-rc2-xfsx #rc2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:fsstress        state:D stack:10872 pid:503331 tgid:503331 ppid:503327 task_flags:0x400140 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x458/0x14c0
 ? wb_queue_work+0x8e/0x100
 schedule+0x2a/0xe0
 wb_wait_for_completion+0x8d/0xc0
 ? cpuacct_css_alloc+0xa0/0xa0
 __writeback_inodes_sb_nr+0x9f/0xc0
 sync_filesystem+0x29/0x90
 __x64_sys_syncfs+0x40/0xa0
 do_syscall_64+0x37/0xf0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f0a3b837d17
RSP: 002b:00007ffd2eea1988 EFLAGS: 00000206 ORIG_RAX: 0000000000000132
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f0a3b837d17
RDX: 0000000000000000 RSI: 00005603860ff430 RDI: 0000000000000005
RBP: 000000000000cc4e R08: 0000000000000036 R09: 00007f0a3b92f000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000001
R13: 8f5c28f5c28f5c29 R14: 00007ffd2eea19d0 R15: 0000560348867790
 </TASK>
INFO: task fsstress:503332 blocked for more than 61 seconds.
      Not tainted 6.15.0-rc2-xfsx #rc2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:fsstress        state:D stack:10872 pid:503332 tgid:503332 ppid:503327 task_flags:0x400140 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x458/0x14c0
 ? wb_queue_work+0x8e/0x100
 schedule+0x2a/0xe0
 wb_wait_for_completion+0x8d/0xc0
 ? cpuacct_css_alloc+0xa0/0xa0
 __writeback_inodes_sb_nr+0x9f/0xc0
 sync_filesystem+0x29/0x90
 __x64_sys_syncfs+0x40/0xa0
 do_syscall_64+0x37/0xf0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f0a3b837d17
RSP: 002b:00007ffd2eea1988 EFLAGS: 00000206 ORIG_RAX: 0000000000000132
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f0a3b837d17
RDX: 0000000000000000 RSI: 00005603860ff430 RDI: 0000000000000005
RBP: 000000000000ecb4 R08: 0000000000000039 R09: 00007f0a3b92f000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000001
R13: 8f5c28f5c28f5c29 R14: 00007ffd2eea19d0 R15: 0000560348867790
 </TASK>
INFO: task fsstress:503333 blocked for more than 61 seconds.
      Not tainted 6.15.0-rc2-xfsx #rc2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:fsstress        state:D stack:10872 pid:503333 tgid:503333 ppid:503327 task_flags:0x400140 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x458/0x14c0
 ? wb_queue_work+0x8e/0x100
 schedule+0x2a/0xe0
 wb_wait_for_completion+0x8d/0xc0
 ? cpuacct_css_alloc+0xa0/0xa0
 __writeback_inodes_sb_nr+0x9f/0xc0
 sync_filesystem+0x29/0x90
 __x64_sys_syncfs+0x40/0xa0
 do_syscall_64+0x37/0xf0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f0a3b837d17
RSP: 002b:00007ffd2eea1988 EFLAGS: 00000206 ORIG_RAX: 0000000000000132
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f0a3b837d17
RDX: 0000000000000000 RSI: 00005603860ff430 RDI: 0000000000000005
RBP: 000000000000df01 R08: 000000000000006d R09: 00007f0a3b92f000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000001
R13: 8f5c28f5c28f5c29 R14: 00007ffd2eea19d0 R15: 0000560348867790
 </TASK>

--D

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [6.15-rc2 regression] xfs: null pointer in the dax fault code
  2025-04-16 17:43 [6.15-rc2 regression] xfs: null pointer in the dax fault code Darrick J. Wong
  2025-04-16 20:22 ` Darrick J. Wong
@ 2025-04-16 20:50 ` Dave Chinner
  2025-04-17  3:01   ` Darrick J. Wong
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2025-04-16 20:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-fsdevel, linux-xfs

On Wed, Apr 16, 2025 at 10:43:58AM -0700, Darrick J. Wong wrote:
> Hi folks,
> 
> After upgrading to 6.15-rc2, I see the following crash in (I think?) the
> DAX code on xfs/593 (which is a fairly boring fsck test).
> 
> MKFS_OPTIONS=" -m metadir=1,autofsck=1,uquota,gquota,pquota, -d daxinherit=1,"
> MOUNT_OPTIONS=""
> 
> Any ideas?  Does this stack trace ring a bell for anyone?

That looks like the stack trace in this patch posted to -fsdevel a
week ago:

https://lore.kernel.org/linux-fsdevel/20250410091020.119116-1-david@redhat.com/

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [6.15-rc2 regression] xfs: null pointer in the dax fault code
  2025-04-16 20:50 ` Dave Chinner
@ 2025-04-17  3:01   ` Darrick J. Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2025-04-17  3:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-xfs

On Thu, Apr 17, 2025 at 06:50:51AM +1000, Dave Chinner wrote:
> On Wed, Apr 16, 2025 at 10:43:58AM -0700, Darrick J. Wong wrote:
> > Hi folks,
> > 
> > After upgrading to 6.15-rc2, I see the following crash in (I think?) the
> > DAX code on xfs/593 (which is a fairly boring fsck test).
> > 
> > MKFS_OPTIONS=" -m metadir=1,autofsck=1,uquota,gquota,pquota, -d daxinherit=1,"
> > MOUNT_OPTIONS=""
> > 
> > Any ideas?  Does this stack trace ring a bell for anyone?
> 
> That looks like the stack trace in this patch posted to -fsdevel a
> week ago:
> 
> https://lore.kernel.org/linux-fsdevel/20250410091020.119116-1-david@redhat.com/

Ah, thanks.  Will try that patch out.
--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-04-17  3:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-16 17:43 [6.15-rc2 regression] xfs: null pointer in the dax fault code Darrick J. Wong
2025-04-16 20:22 ` Darrick J. Wong
2025-04-16 20:50 ` Dave Chinner
2025-04-17  3:01   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox