public inbox for fstests@vger.kernel.org
 help / color / mirror / Atom feed
* [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
@ 2023-12-18 14:01 Zorro Lang
  2023-12-18 17:48 ` Darrick J. Wong
  2024-01-18  4:20 ` Dave Chinner
  0 siblings, 2 replies; 7+ messages in thread
From: Zorro Lang @ 2023-12-18 14:01 UTC (permalink / raw)
  To: linux-xfs; +Cc: fstests

Hi,

Recently I hit a crash [1] on s390x with 64k directory block size xfs
(-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
even not panic, a assertion failure will happen.

I found it from an old downstream kernel at first, then reproduced it
on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
time this issue be there, just reported it at first.

I hit and reproduced it several times by running generic/648 on s390x
with "-n size=65536" xfs many times. Not sure if s390x is needed, I'll
try to check other arches.

Thanks,
Zorro

[1]
 [  899.271221] run fstests generic/648 at 2023-12-18 01:18:03
 [  899.454444] XFS (loop1): Mounting V5 Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
 [  899.455844] XFS (loop1): Ending clean mount
 [  899.463623] XFS (loop1): Unmounting Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
 [  899.587365] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
 [  899.588501] XFS (dm-3): Ending clean mount
 ...
 ...
 [  974.217814] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
 [  974.803146] XFS (loop3): Starting recovery (logdev: internal)
 [  974.804634] XFS (loop3): Ending recovery (logdev: internal)
 [  975.782734] Direct I/O collision with buffered writes! File: /p2/df/f4f Comm: fsstress
 [  976.831942] lo_write_bvec: 15 callbacks suppressed
 [  976.831946] loop: Write error at byte offset 1766129664, length 4096.
 [  976.831953] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
 [  976.831957] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
 [  976.831960] XFS (loop3): log I/O error -5
 [  976.831963] XFS (loop3): Filesystem has been shut down due to log error (0x2).
 [  976.831964] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
 [  976.839603] buffer_io_error: 6 callbacks suppressed
 [  976.839604] Buffer I/O error on dev dm-3, logical block 20971392, async page read
 [  976.839607] Buffer I/O error on dev dm-3, logical block 20971393, async page read
 [  976.839609] Buffer I/O error on dev dm-3, logical block 20971394, async page read
 [  976.839611] Buffer I/O error on dev dm-3, logical block 20971395, async page read
 [  976.839612] Buffer I/O error on dev dm-3, logical block 20971396, async page read
 [  976.839613] Buffer I/O error on dev dm-3, logical block 20971397, async page read
 [  976.839614] Buffer I/O error on dev dm-3, logical block 20971398, async page read
 [  976.839616] Buffer I/O error on dev dm-3, logical block 20971399, async page read
 [  977.419266] XFS (loop3): Unmounting Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
 [  977.423981] iomap_finish_ioend: 15 callbacks suppressed
 [  977.423982] dm-3: writeback error on inode 131, offset 1766125568, sector 4059696
 [  977.423988] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
 [  978.434124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
 [  978.449156] XFS (dm-3): log I/O error -5
 [  978.449159] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
 [  978.449160] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
 [  978.449163] XFS (dm-3): log I/O error -5
 [  978.449164] XFS (dm-3): log I/O error -5
 [  978.449165] XFS (dm-3): log I/O error -5
 [  978.449166] XFS (dm-3): log I/O error -5
 [  978.449167] XFS (dm-3): log I/O error -5
 [  978.498053] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
 [  978.513331] XFS (dm-3): Starting recovery (logdev: internal)
 [  978.584227] XFS (dm-3): Ending recovery (logdev: internal)
 [  978.587276] loop3: detected capacity change from 0 to 6876346
 [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
 [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
 [  979.225078] XFS (loop3): Bad dir block magic!
 [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
 [  979.225120] ------------[ cut here ]------------
 [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail+0x4e/0x68 [xfs]
 [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
 [  979.225373] CPU: 1 PID: 157902 Comm: mount Kdump: loaded Not tainted 6.7.0-rc6 #1
 [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
 [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 [xfs])
 [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
 [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
 [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
 [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
 [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
 [  979.225476] Krnl Code: 000003ff7ffead6e: 95001010		cli	16(%r1),0
 [  979.225476]            000003ff7ffead72: a774000a		brc	7,000003ff7ffead86
 [  979.225476]           #000003ff7ffead76: af000000		mc	0,0
 [  979.225476]           >000003ff7ffead7a: eb6ff0a80004	lmg	%r6,%r15,168(%r15)
 [  979.225476]            000003ff7ffead80: 07fe		bcr	15,%r14
 [  979.225476]            000003ff7ffead82: 47000700		bc	0,1792
 [  979.225476]            000003ff7ffead86: af000000		mc	0,0
 [  979.225476]            000003ff7ffead8a: 0707		bcr	0,%r7
 [  979.225487] Call Trace:
 [  979.225488]  [<000003ff7ffead7a>] assfail+0x52/0x68 [xfs] 
 [  979.225568] ([<000003ff7ffead68>] assfail+0x40/0x68 [xfs])
 [  979.225648]  [<000003ff80004b4e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
 [  979.225727]  [<000003ff80005eca>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
 [  979.225807]  [<000003ff8001077a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
 [  979.225886]  [<000003ff80011436>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
 [  979.225965]  [<000003ff80011598>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
 [  979.226045]  [<000003ff80011660>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
 [  979.226124]  [<000003ff80012126>] xlog_recover_process_data+0xb6/0x168 [xfs] 
 [  979.226204]  [<000003ff800122dc>] xlog_recover_process+0x104/0x150 [xfs] 
 [  979.226285]  [<000003ff800125a4>] xlog_do_recovery_pass+0x27c/0x748 [xfs] 
 [  979.226366]  [<000003ff80012ec8>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
 [  979.226446]  [<000003ff80012f64>] xlog_do_recover+0x4c/0x218 [xfs] 
 [  979.226526]  [<000003ff8001446a>] xlog_recover+0xda/0x1a0 [xfs] 
 [  979.226607]  [<000003ff7fffa51e>] xfs_log_mount+0x11e/0x280 [xfs] 
 [  979.226686]  [<000003ff7ffec226>] xfs_mountfs+0x3e6/0x928 [xfs] 
 [  979.226765]  [<000003ff7fff3b4c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
 [  979.226844]  [<00000001ef53bfd4>] get_tree_bdev+0x144/0x1d0 
 [  979.226850]  [<00000001ef539a08>] vfs_get_tree+0x38/0x110 
 [  979.226852]  [<00000001ef56c432>] do_new_mount+0x17a/0x2d0 
 [  979.226855]  [<00000001ef56d0ac>] path_mount+0x1ac/0x818 
 [  979.226857]  [<00000001ef56d81c>] __s390x_sys_mount+0x104/0x148 
 [  979.226860]  [<00000001efbb6e00>] __do_syscall+0x1d0/0x1f8 
 [  979.226864]  [<00000001efbc7088>] system_call+0x70/0x98 
 [  979.226868] Last Breaking-Event-Address:
 [  979.226868]  [<000003ff7ffeabc4>] xfs_printk_level+0xac/0xd8 [xfs]
 [  979.226949] ---[ end trace 0000000000000000 ]---
 [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
 [  979.227732] XFS (loop3): Unmount and run xfs_repair
 [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
 [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
 [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
 [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
 [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
 [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
 [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
 [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
 [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
 [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558).  Shutting down filesystem.
 [  979.227826] XFS (loop3): Please unmount the filesystem and rectify the problem(s)
 [  979.227912] XFS (loop3): log mount/recovery failed: error -117
 [  979.228061] XFS (loop3): log mount failed
 [  980.224124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
 [  980.318601] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
 [  980.353946] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
 [  980.355207] XFS (loop0): Ending clean mount
 [  982.585362] XFS (loop1): Mounting V5 Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
 [  982.586881] XFS (loop1): Ending clean mount
 [  982.588944] XFS (loop1): Unmounting Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
 [  982.603160] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
 [  982.620838] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
 [  982.636206] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
 [  982.637231] XFS (loop0): Ending clean mount
 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2023-12-18 14:01 [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply Zorro Lang
@ 2023-12-18 17:48 ` Darrick J. Wong
  2023-12-19  6:34   ` Zorro Lang
  2024-01-18  4:20 ` Dave Chinner
  1 sibling, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2023-12-18 17:48 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs, fstests

On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> Hi,
> 
> Recently I hit a crash [1] on s390x with 64k directory block size xfs
> (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> even not panic, a assertion failure will happen.
> 
> I found it from an old downstream kernel at first, then reproduced it
> on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> time this issue be there, just reported it at first.
> 
> I hit and reproduced it several times by running generic/648 on s390x
> with "-n size=65536" xfs many times. Not sure if s390x is needed, I'll
> try to check other arches.

XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020

Where in the source code is this             ^^^^^^^^^^^^^^^^^^^^^^^^^^^?

Also, does "xfs: update dir3 leaf block metadata after swap" fix it?

(I suspect it won't because that's a dirent block, but for-next has a
few directory fixes in it.)

--D

> 
> Thanks,
> Zorro
> 
> [1]
>  [  899.271221] run fstests generic/648 at 2023-12-18 01:18:03
>  [  899.454444] XFS (loop1): Mounting V5 Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
>  [  899.455844] XFS (loop1): Ending clean mount
>  [  899.463623] XFS (loop1): Unmounting Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
>  [  899.587365] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
>  [  899.588501] XFS (dm-3): Ending clean mount
>  ...
>  ...
>  [  974.217814] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
>  [  974.803146] XFS (loop3): Starting recovery (logdev: internal)
>  [  974.804634] XFS (loop3): Ending recovery (logdev: internal)
>  [  975.782734] Direct I/O collision with buffered writes! File: /p2/df/f4f Comm: fsstress
>  [  976.831942] lo_write_bvec: 15 callbacks suppressed
>  [  976.831946] loop: Write error at byte offset 1766129664, length 4096.
>  [  976.831953] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
>  [  976.831957] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
>  [  976.831960] XFS (loop3): log I/O error -5
>  [  976.831963] XFS (loop3): Filesystem has been shut down due to log error (0x2).
>  [  976.831964] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
>  [  976.839603] buffer_io_error: 6 callbacks suppressed
>  [  976.839604] Buffer I/O error on dev dm-3, logical block 20971392, async page read
>  [  976.839607] Buffer I/O error on dev dm-3, logical block 20971393, async page read
>  [  976.839609] Buffer I/O error on dev dm-3, logical block 20971394, async page read
>  [  976.839611] Buffer I/O error on dev dm-3, logical block 20971395, async page read
>  [  976.839612] Buffer I/O error on dev dm-3, logical block 20971396, async page read
>  [  976.839613] Buffer I/O error on dev dm-3, logical block 20971397, async page read
>  [  976.839614] Buffer I/O error on dev dm-3, logical block 20971398, async page read
>  [  976.839616] Buffer I/O error on dev dm-3, logical block 20971399, async page read
>  [  977.419266] XFS (loop3): Unmounting Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
>  [  977.423981] iomap_finish_ioend: 15 callbacks suppressed
>  [  977.423982] dm-3: writeback error on inode 131, offset 1766125568, sector 4059696
>  [  977.423988] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
>  [  978.434124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
>  [  978.449156] XFS (dm-3): log I/O error -5
>  [  978.449159] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
>  [  978.449160] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
>  [  978.449163] XFS (dm-3): log I/O error -5
>  [  978.449164] XFS (dm-3): log I/O error -5
>  [  978.449165] XFS (dm-3): log I/O error -5
>  [  978.449166] XFS (dm-3): log I/O error -5
>  [  978.449167] XFS (dm-3): log I/O error -5
>  [  978.498053] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
>  [  978.513331] XFS (dm-3): Starting recovery (logdev: internal)
>  [  978.584227] XFS (dm-3): Ending recovery (logdev: internal)
>  [  978.587276] loop3: detected capacity change from 0 to 6876346
>  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
>  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
>  [  979.225078] XFS (loop3): Bad dir block magic!
>  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
>  [  979.225120] ------------[ cut here ]------------
>  [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail+0x4e/0x68 [xfs]
>  [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
>  [  979.225373] CPU: 1 PID: 157902 Comm: mount Kdump: loaded Not tainted 6.7.0-rc6 #1
>  [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
>  [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 [xfs])
>  [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>  [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
>  [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
>  [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
>  [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
>  [  979.225476] Krnl Code: 000003ff7ffead6e: 95001010		cli	16(%r1),0
>  [  979.225476]            000003ff7ffead72: a774000a		brc	7,000003ff7ffead86
>  [  979.225476]           #000003ff7ffead76: af000000		mc	0,0
>  [  979.225476]           >000003ff7ffead7a: eb6ff0a80004	lmg	%r6,%r15,168(%r15)
>  [  979.225476]            000003ff7ffead80: 07fe		bcr	15,%r14
>  [  979.225476]            000003ff7ffead82: 47000700		bc	0,1792
>  [  979.225476]            000003ff7ffead86: af000000		mc	0,0
>  [  979.225476]            000003ff7ffead8a: 0707		bcr	0,%r7
>  [  979.225487] Call Trace:
>  [  979.225488]  [<000003ff7ffead7a>] assfail+0x52/0x68 [xfs] 
>  [  979.225568] ([<000003ff7ffead68>] assfail+0x40/0x68 [xfs])
>  [  979.225648]  [<000003ff80004b4e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
>  [  979.225727]  [<000003ff80005eca>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
>  [  979.225807]  [<000003ff8001077a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
>  [  979.225886]  [<000003ff80011436>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
>  [  979.225965]  [<000003ff80011598>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
>  [  979.226045]  [<000003ff80011660>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
>  [  979.226124]  [<000003ff80012126>] xlog_recover_process_data+0xb6/0x168 [xfs] 
>  [  979.226204]  [<000003ff800122dc>] xlog_recover_process+0x104/0x150 [xfs] 
>  [  979.226285]  [<000003ff800125a4>] xlog_do_recovery_pass+0x27c/0x748 [xfs] 
>  [  979.226366]  [<000003ff80012ec8>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
>  [  979.226446]  [<000003ff80012f64>] xlog_do_recover+0x4c/0x218 [xfs] 
>  [  979.226526]  [<000003ff8001446a>] xlog_recover+0xda/0x1a0 [xfs] 
>  [  979.226607]  [<000003ff7fffa51e>] xfs_log_mount+0x11e/0x280 [xfs] 
>  [  979.226686]  [<000003ff7ffec226>] xfs_mountfs+0x3e6/0x928 [xfs] 
>  [  979.226765]  [<000003ff7fff3b4c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
>  [  979.226844]  [<00000001ef53bfd4>] get_tree_bdev+0x144/0x1d0 
>  [  979.226850]  [<00000001ef539a08>] vfs_get_tree+0x38/0x110 
>  [  979.226852]  [<00000001ef56c432>] do_new_mount+0x17a/0x2d0 
>  [  979.226855]  [<00000001ef56d0ac>] path_mount+0x1ac/0x818 
>  [  979.226857]  [<00000001ef56d81c>] __s390x_sys_mount+0x104/0x148 
>  [  979.226860]  [<00000001efbb6e00>] __do_syscall+0x1d0/0x1f8 
>  [  979.226864]  [<00000001efbc7088>] system_call+0x70/0x98 
>  [  979.226868] Last Breaking-Event-Address:
>  [  979.226868]  [<000003ff7ffeabc4>] xfs_printk_level+0xac/0xd8 [xfs]
>  [  979.226949] ---[ end trace 0000000000000000 ]---
>  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
>  [  979.227732] XFS (loop3): Unmount and run xfs_repair
>  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
>  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
>  [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
>  [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
>  [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
>  [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
>  [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
>  [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
>  [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
>  [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558).  Shutting down filesystem.
>  [  979.227826] XFS (loop3): Please unmount the filesystem and rectify the problem(s)
>  [  979.227912] XFS (loop3): log mount/recovery failed: error -117
>  [  979.228061] XFS (loop3): log mount failed
>  [  980.224124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
>  [  980.318601] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
>  [  980.353946] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
>  [  980.355207] XFS (loop0): Ending clean mount
>  [  982.585362] XFS (loop1): Mounting V5 Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
>  [  982.586881] XFS (loop1): Ending clean mount
>  [  982.588944] XFS (loop1): Unmounting Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
>  [  982.603160] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
>  [  982.620838] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
>  [  982.636206] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
>  [  982.637231] XFS (loop0): Ending clean mount
>  
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2023-12-18 17:48 ` Darrick J. Wong
@ 2023-12-19  6:34   ` Zorro Lang
  2023-12-25 13:38     ` Zorro Lang
  0 siblings, 1 reply; 7+ messages in thread
From: Zorro Lang @ 2023-12-19  6:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, fstests

On Mon, Dec 18, 2023 at 09:48:36AM -0800, Darrick J. Wong wrote:
> On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> > Hi,
> > 
> > Recently I hit a crash [1] on s390x with 64k directory block size xfs
> > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> > even not panic, a assertion failure will happen.
> > 
> > I found it from an old downstream kernel at first, then reproduced it
> > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> > time this issue be there, just reported it at first.
> > 
> > I hit and reproduced it several times by running generic/648 on s390x
> > with "-n size=65536" xfs many times. Not sure if s390x is needed, I'll
> > try to check other arches.
> 
> XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020
> 
> Where in the source code is this             ^^^^^^^^^^^^^^^^^^^^^^^^^^^?

The source is just based on HEAD="v6.7-rc6", the decode_stacktrace.sh output is:
  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020

The code is as below:

        switch (hdr->magic) {
        case cpu_to_be32(XFS_DIR3_BLOCK_MAGIC):
        case cpu_to_be32(XFS_DIR2_BLOCK_MAGIC):
                btp = xfs_dir2_block_tail_p(geo, hdr);
                lep = xfs_dir2_block_leaf_p(btp);

                if (be32_to_cpu(btp->count) >=
                    xfs_dir2_data_max_leaf_entries(geo))
137: ==>                 return __this_address;
                break;


More output as below:

[  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
[  979.225120] ------------[ cut here ]------------
[  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail (fs/xfs/xfs_message.c:104 (discriminator 3)) xfs
[  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm
+fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log
+dm_mod pkey zcrypt aes_s390
[  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
[  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 xfs])
[  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
[  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
[  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
[  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
[ 979.225476] Krnl Code: 000003ff7ffead6e: 95001010 cli 16(%r1),0
All code
========

Code starting with the faulting instruction
===========================================
[  979.225476]            000003ff7ffead72: a774000a           brc     7,000003ff7ffead86
[  979.225476]           #000003ff7ffead76: af000000           mc      0,0
[  979.225476]           >000003ff7ffead7a: eb6ff0a80004       lmg     %r6,%r15,168(%r15)
[  979.225476]            000003ff7ffead80: 07fe               bcr     15,%r14
[  979.225476]            000003ff7ffead82: 47000700           bc      0,1792
[  979.225476]            000003ff7ffead86: af000000           mc      0,0
[  979.225476]            000003ff7ffead8a: 0707               bcr     0,%r7
[  979.225487] Call Trace:
[  979.225488] assfail (fs/xfs/xfs_message.c:105 (discriminator 3)) xfs
[  979.225568] assfail+0x40/0x68 xfs])
[  979.225648] xlog_recover_validate_buf_type (fs/xfs/xfs_buf_item_recover.c:414 (discriminator 1)) xfs
[  979.225727] xlog_recover_buf_commit_pass2 (fs/xfs/xfs_buf_item_recover.c:971) xfs
[  979.225807] xlog_recover_items_pass2 (fs/xfs/xfs_log_recover.c:1957) xfs
[  979.225886] xlog_recover_commit_trans (./include/linux/list.h:373 ./include/linux/list.h:588 fs/xfs/xfs_log_recover.c:2028) xfs
[  979.225965] xlog_recovery_process_trans (fs/xfs/xfs_log_recover.c:2257) xfs
[  979.226045] xlog_recover_process_ophdr (fs/xfs/xfs_log_recover.c:2405) xfs
[  979.226124] xlog_recover_process_data (fs/xfs/xfs_log_recover.c:2447) xfs
[  979.226204] xlog_recover_process (fs/xfs/xfs_log_recover.c:2901) xfs
[  979.226285] xlog_do_recovery_pass (fs/xfs/xfs_log_recover.c:3153) xfs
[  979.226366] xlog_do_log_recovery (fs/xfs/xfs_log_recover.c:3265) xfs
[  979.226446] xlog_do_recover (fs/xfs/xfs_log_recover.c:3293) xfs
[  979.226526] xlog_recover (./arch/s390/include/asm/atomic_ops.h:71 (discriminator 1) ./arch/s390/include/asm/bitops.h:63 (discriminator 1) ./include/asm-generic/bitops/instrumented-atomic.h:29 (discriminator 1) fs/xfs/xfs_log_recover.c:3427 (discriminator 1)) xfs
[  979.226607] xfs_log_mount (fs/xfs/xfs_log.c:718) xfs
[  979.226686] xfs_mountfs (fs/xfs/xfs_mount.c:822) xfs
[  979.226765] xfs_fs_fill_super (fs/xfs/xfs_super.c:1734) xfs
[  979.226844] get_tree_bdev (fs/super.c:1598) 
[  979.226850] vfs_get_tree (fs/super.c:1771) 
[  979.226852] do_new_mount (fs/namespace.c:3337) 
[  979.226855] path_mount (fs/namespace.c:3664) 
[  979.226857] __s390x_sys_mount (fs/namespace.c:3677 fs/namespace.c:3886 fs/namespace.c:3863 fs/namespace.c:3863) 
[  979.226860] __do_syscall (arch/s390/kernel/syscall.c:144 arch/s390/kernel/syscall.c:168) 
[  979.226864] system_call (arch/s390/kernel/entry.S:309) 
[  979.226868] Last Breaking-Event-Address:
[  979.226868] xfs_printk_level (fs/xfs/xfs_message.c:52) xfs
[  979.226949] ---[ end trace 0000000000000000 ]---
[  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020
[  979.227732] XFS (loop3): Unmount and run xfs_repair
[  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
[  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3...........
[  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
[  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
[  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
[  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
[  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
[  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
[  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
[  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558). Shutting down filesystem.




> 
> Also, does "xfs: update dir3 leaf block metadata after swap" fix it?

OK, I'll merge and give it a try.

Thanks,
Zorro

> 
> (I suspect it won't because that's a dirent block, but for-next has a
> few directory fixes in it.)
> 
> --D
> 
> > 
> > Thanks,
> > Zorro
> > 
> > [1]
> >  [  899.271221] run fstests generic/648 at 2023-12-18 01:18:03
> >  [  899.454444] XFS (loop1): Mounting V5 Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> >  [  899.455844] XFS (loop1): Ending clean mount
> >  [  899.463623] XFS (loop1): Unmounting Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> >  [  899.587365] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> >  [  899.588501] XFS (dm-3): Ending clean mount
> >  ...
> >  ...
> >  [  974.217814] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> >  [  974.803146] XFS (loop3): Starting recovery (logdev: internal)
> >  [  974.804634] XFS (loop3): Ending recovery (logdev: internal)
> >  [  975.782734] Direct I/O collision with buffered writes! File: /p2/df/f4f Comm: fsstress
> >  [  976.831942] lo_write_bvec: 15 callbacks suppressed
> >  [  976.831946] loop: Write error at byte offset 1766129664, length 4096.
> >  [  976.831953] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> >  [  976.831957] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> >  [  976.831960] XFS (loop3): log I/O error -5
> >  [  976.831963] XFS (loop3): Filesystem has been shut down due to log error (0x2).
> >  [  976.831964] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
> >  [  976.839603] buffer_io_error: 6 callbacks suppressed
> >  [  976.839604] Buffer I/O error on dev dm-3, logical block 20971392, async page read
> >  [  976.839607] Buffer I/O error on dev dm-3, logical block 20971393, async page read
> >  [  976.839609] Buffer I/O error on dev dm-3, logical block 20971394, async page read
> >  [  976.839611] Buffer I/O error on dev dm-3, logical block 20971395, async page read
> >  [  976.839612] Buffer I/O error on dev dm-3, logical block 20971396, async page read
> >  [  976.839613] Buffer I/O error on dev dm-3, logical block 20971397, async page read
> >  [  976.839614] Buffer I/O error on dev dm-3, logical block 20971398, async page read
> >  [  976.839616] Buffer I/O error on dev dm-3, logical block 20971399, async page read
> >  [  977.419266] XFS (loop3): Unmounting Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> >  [  977.423981] iomap_finish_ioend: 15 callbacks suppressed
> >  [  977.423982] dm-3: writeback error on inode 131, offset 1766125568, sector 4059696
> >  [  977.423988] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
> >  [  978.434124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> >  [  978.449156] XFS (dm-3): log I/O error -5
> >  [  978.449159] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
> >  [  978.449160] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
> >  [  978.449163] XFS (dm-3): log I/O error -5
> >  [  978.449164] XFS (dm-3): log I/O error -5
> >  [  978.449165] XFS (dm-3): log I/O error -5
> >  [  978.449166] XFS (dm-3): log I/O error -5
> >  [  978.449167] XFS (dm-3): log I/O error -5
> >  [  978.498053] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> >  [  978.513331] XFS (dm-3): Starting recovery (logdev: internal)
> >  [  978.584227] XFS (dm-3): Ending recovery (logdev: internal)
> >  [  978.587276] loop3: detected capacity change from 0 to 6876346
> >  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> >  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
> >  [  979.225078] XFS (loop3): Bad dir block magic!
> >  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> >  [  979.225120] ------------[ cut here ]------------
> >  [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail+0x4e/0x68 [xfs]
> >  [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
> >  [  979.225373] CPU: 1 PID: 157902 Comm: mount Kdump: loaded Not tainted 6.7.0-rc6 #1
> >  [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> >  [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 [xfs])
> >  [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >  [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
> >  [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
> >  [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
> >  [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
> >  [  979.225476] Krnl Code: 000003ff7ffead6e: 95001010		cli	16(%r1),0
> >  [  979.225476]            000003ff7ffead72: a774000a		brc	7,000003ff7ffead86
> >  [  979.225476]           #000003ff7ffead76: af000000		mc	0,0
> >  [  979.225476]           >000003ff7ffead7a: eb6ff0a80004	lmg	%r6,%r15,168(%r15)
> >  [  979.225476]            000003ff7ffead80: 07fe		bcr	15,%r14
> >  [  979.225476]            000003ff7ffead82: 47000700		bc	0,1792
> >  [  979.225476]            000003ff7ffead86: af000000		mc	0,0
> >  [  979.225476]            000003ff7ffead8a: 0707		bcr	0,%r7
> >  [  979.225487] Call Trace:
> >  [  979.225488]  [<000003ff7ffead7a>] assfail+0x52/0x68 [xfs] 
> >  [  979.225568] ([<000003ff7ffead68>] assfail+0x40/0x68 [xfs])
> >  [  979.225648]  [<000003ff80004b4e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
> >  [  979.225727]  [<000003ff80005eca>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
> >  [  979.225807]  [<000003ff8001077a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
> >  [  979.225886]  [<000003ff80011436>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
> >  [  979.225965]  [<000003ff80011598>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
> >  [  979.226045]  [<000003ff80011660>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
> >  [  979.226124]  [<000003ff80012126>] xlog_recover_process_data+0xb6/0x168 [xfs] 
> >  [  979.226204]  [<000003ff800122dc>] xlog_recover_process+0x104/0x150 [xfs] 
> >  [  979.226285]  [<000003ff800125a4>] xlog_do_recovery_pass+0x27c/0x748 [xfs] 
> >  [  979.226366]  [<000003ff80012ec8>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
> >  [  979.226446]  [<000003ff80012f64>] xlog_do_recover+0x4c/0x218 [xfs] 
> >  [  979.226526]  [<000003ff8001446a>] xlog_recover+0xda/0x1a0 [xfs] 
> >  [  979.226607]  [<000003ff7fffa51e>] xfs_log_mount+0x11e/0x280 [xfs] 
> >  [  979.226686]  [<000003ff7ffec226>] xfs_mountfs+0x3e6/0x928 [xfs] 
> >  [  979.226765]  [<000003ff7fff3b4c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
> >  [  979.226844]  [<00000001ef53bfd4>] get_tree_bdev+0x144/0x1d0 
> >  [  979.226850]  [<00000001ef539a08>] vfs_get_tree+0x38/0x110 
> >  [  979.226852]  [<00000001ef56c432>] do_new_mount+0x17a/0x2d0 
> >  [  979.226855]  [<00000001ef56d0ac>] path_mount+0x1ac/0x818 
> >  [  979.226857]  [<00000001ef56d81c>] __s390x_sys_mount+0x104/0x148 
> >  [  979.226860]  [<00000001efbb6e00>] __do_syscall+0x1d0/0x1f8 
> >  [  979.226864]  [<00000001efbc7088>] system_call+0x70/0x98 
> >  [  979.226868] Last Breaking-Event-Address:
> >  [  979.226868]  [<000003ff7ffeabc4>] xfs_printk_level+0xac/0xd8 [xfs]
> >  [  979.226949] ---[ end trace 0000000000000000 ]---
> >  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
> >  [  979.227732] XFS (loop3): Unmount and run xfs_repair
> >  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> >  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
> >  [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
> >  [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
> >  [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
> >  [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
> >  [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
> >  [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
> >  [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
> >  [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558).  Shutting down filesystem.
> >  [  979.227826] XFS (loop3): Please unmount the filesystem and rectify the problem(s)
> >  [  979.227912] XFS (loop3): log mount/recovery failed: error -117
> >  [  979.228061] XFS (loop3): log mount failed
> >  [  980.224124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> >  [  980.318601] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> >  [  980.353946] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> >  [  980.355207] XFS (loop0): Ending clean mount
> >  [  982.585362] XFS (loop1): Mounting V5 Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> >  [  982.586881] XFS (loop1): Ending clean mount
> >  [  982.588944] XFS (loop1): Unmounting Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> >  [  982.603160] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
> >  [  982.620838] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> >  [  982.636206] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> >  [  982.637231] XFS (loop0): Ending clean mount
> >  
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2023-12-19  6:34   ` Zorro Lang
@ 2023-12-25 13:38     ` Zorro Lang
  2024-01-04  4:35       ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Zorro Lang @ 2023-12-25 13:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, fstests

On Tue, Dec 19, 2023 at 02:34:20PM +0800, Zorro Lang wrote:
> On Mon, Dec 18, 2023 at 09:48:36AM -0800, Darrick J. Wong wrote:
> > On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> > > Hi,
> > > 
> > > Recently I hit a crash [1] on s390x with 64k directory block size xfs
> > > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> > > even not panic, a assertion failure will happen.
> > > 
> > > I found it from an old downstream kernel at first, then reproduced it
> > > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> > > time this issue be there, just reported it at first.
> > > 
> > > I hit and reproduced it several times by running generic/648 on s390x
> > > with "-n size=65536" xfs many times. Not sure if s390x is needed, I'll
> > > try to check other arches.
> > 
> > XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020
> > 
> > Where in the source code is this             ^^^^^^^^^^^^^^^^^^^^^^^^^^^?
> 
> The source is just based on HEAD="v6.7-rc6", the decode_stacktrace.sh output is:
>   [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020
> 
> The code is as below:
> 
>         switch (hdr->magic) {
>         case cpu_to_be32(XFS_DIR3_BLOCK_MAGIC):
>         case cpu_to_be32(XFS_DIR2_BLOCK_MAGIC):
>                 btp = xfs_dir2_block_tail_p(geo, hdr);
>                 lep = xfs_dir2_block_leaf_p(btp);
> 
>                 if (be32_to_cpu(btp->count) >=
>                     xfs_dir2_data_max_leaf_entries(geo))
> 137: ==>                 return __this_address;
>                 break;
> 
> 
> More output as below:
> 
> [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> [  979.225120] ------------[ cut here ]------------
> [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail (fs/xfs/xfs_message.c:104 (discriminator 3)) xfs
> [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm
> +fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log
> +dm_mod pkey zcrypt aes_s390
> [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 xfs])
> [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
> [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
> [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
> [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
> [ 979.225476] Krnl Code: 000003ff7ffead6e: 95001010 cli 16(%r1),0
> All code
> ========
> 
> Code starting with the faulting instruction
> ===========================================
> [  979.225476]            000003ff7ffead72: a774000a           brc     7,000003ff7ffead86
> [  979.225476]           #000003ff7ffead76: af000000           mc      0,0
> [  979.225476]           >000003ff7ffead7a: eb6ff0a80004       lmg     %r6,%r15,168(%r15)
> [  979.225476]            000003ff7ffead80: 07fe               bcr     15,%r14
> [  979.225476]            000003ff7ffead82: 47000700           bc      0,1792
> [  979.225476]            000003ff7ffead86: af000000           mc      0,0
> [  979.225476]            000003ff7ffead8a: 0707               bcr     0,%r7
> [  979.225487] Call Trace:
> [  979.225488] assfail (fs/xfs/xfs_message.c:105 (discriminator 3)) xfs
> [  979.225568] assfail+0x40/0x68 xfs])
> [  979.225648] xlog_recover_validate_buf_type (fs/xfs/xfs_buf_item_recover.c:414 (discriminator 1)) xfs
> [  979.225727] xlog_recover_buf_commit_pass2 (fs/xfs/xfs_buf_item_recover.c:971) xfs
> [  979.225807] xlog_recover_items_pass2 (fs/xfs/xfs_log_recover.c:1957) xfs
> [  979.225886] xlog_recover_commit_trans (./include/linux/list.h:373 ./include/linux/list.h:588 fs/xfs/xfs_log_recover.c:2028) xfs
> [  979.225965] xlog_recovery_process_trans (fs/xfs/xfs_log_recover.c:2257) xfs
> [  979.226045] xlog_recover_process_ophdr (fs/xfs/xfs_log_recover.c:2405) xfs
> [  979.226124] xlog_recover_process_data (fs/xfs/xfs_log_recover.c:2447) xfs
> [  979.226204] xlog_recover_process (fs/xfs/xfs_log_recover.c:2901) xfs
> [  979.226285] xlog_do_recovery_pass (fs/xfs/xfs_log_recover.c:3153) xfs
> [  979.226366] xlog_do_log_recovery (fs/xfs/xfs_log_recover.c:3265) xfs
> [  979.226446] xlog_do_recover (fs/xfs/xfs_log_recover.c:3293) xfs
> [  979.226526] xlog_recover (./arch/s390/include/asm/atomic_ops.h:71 (discriminator 1) ./arch/s390/include/asm/bitops.h:63 (discriminator 1) ./include/asm-generic/bitops/instrumented-atomic.h:29 (discriminator 1) fs/xfs/xfs_log_recover.c:3427 (discriminator 1)) xfs
> [  979.226607] xfs_log_mount (fs/xfs/xfs_log.c:718) xfs
> [  979.226686] xfs_mountfs (fs/xfs/xfs_mount.c:822) xfs
> [  979.226765] xfs_fs_fill_super (fs/xfs/xfs_super.c:1734) xfs
> [  979.226844] get_tree_bdev (fs/super.c:1598) 
> [  979.226850] vfs_get_tree (fs/super.c:1771) 
> [  979.226852] do_new_mount (fs/namespace.c:3337) 
> [  979.226855] path_mount (fs/namespace.c:3664) 
> [  979.226857] __s390x_sys_mount (fs/namespace.c:3677 fs/namespace.c:3886 fs/namespace.c:3863 fs/namespace.c:3863) 
> [  979.226860] __do_syscall (arch/s390/kernel/syscall.c:144 arch/s390/kernel/syscall.c:168) 
> [  979.226864] system_call (arch/s390/kernel/entry.S:309) 
> [  979.226868] Last Breaking-Event-Address:
> [  979.226868] xfs_printk_level (fs/xfs/xfs_message.c:52) xfs
> [  979.226949] ---[ end trace 0000000000000000 ]---
> [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020
> [  979.227732] XFS (loop3): Unmount and run xfs_repair
> [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3...........
> [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
> [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
> [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
> [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
> [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
> [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
> [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
> [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558). Shutting down filesystem.
> 
> 
> 
> 
> > 
> > Also, does "xfs: update dir3 leaf block metadata after swap" fix it?
> 
> OK, I'll merge and give it a try.

It's still reproducible on xfs-linux for-next branch xfs-6.8-merge-2, which
contains the 5759aa4f9560 ("xfs: update dir3 leaf block metadata after swap")

Thanks,
Zorro

> 
> Thanks,
> Zorro
> 
> > 
> > (I suspect it won't because that's a dirent block, but for-next has a
> > few directory fixes in it.)
> > 
> > --D
> > 
> > > 
> > > Thanks,
> > > Zorro
> > > 
> > > [1]
> > >  [  899.271221] run fstests generic/648 at 2023-12-18 01:18:03
> > >  [  899.454444] XFS (loop1): Mounting V5 Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> > >  [  899.455844] XFS (loop1): Ending clean mount
> > >  [  899.463623] XFS (loop1): Unmounting Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> > >  [  899.587365] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > >  [  899.588501] XFS (dm-3): Ending clean mount
> > >  ...
> > >  ...
> > >  [  974.217814] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > >  [  974.803146] XFS (loop3): Starting recovery (logdev: internal)
> > >  [  974.804634] XFS (loop3): Ending recovery (logdev: internal)
> > >  [  975.782734] Direct I/O collision with buffered writes! File: /p2/df/f4f Comm: fsstress
> > >  [  976.831942] lo_write_bvec: 15 callbacks suppressed
> > >  [  976.831946] loop: Write error at byte offset 1766129664, length 4096.
> > >  [  976.831953] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> > >  [  976.831957] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> > >  [  976.831960] XFS (loop3): log I/O error -5
> > >  [  976.831963] XFS (loop3): Filesystem has been shut down due to log error (0x2).
> > >  [  976.831964] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
> > >  [  976.839603] buffer_io_error: 6 callbacks suppressed
> > >  [  976.839604] Buffer I/O error on dev dm-3, logical block 20971392, async page read
> > >  [  976.839607] Buffer I/O error on dev dm-3, logical block 20971393, async page read
> > >  [  976.839609] Buffer I/O error on dev dm-3, logical block 20971394, async page read
> > >  [  976.839611] Buffer I/O error on dev dm-3, logical block 20971395, async page read
> > >  [  976.839612] Buffer I/O error on dev dm-3, logical block 20971396, async page read
> > >  [  976.839613] Buffer I/O error on dev dm-3, logical block 20971397, async page read
> > >  [  976.839614] Buffer I/O error on dev dm-3, logical block 20971398, async page read
> > >  [  976.839616] Buffer I/O error on dev dm-3, logical block 20971399, async page read
> > >  [  977.419266] XFS (loop3): Unmounting Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > >  [  977.423981] iomap_finish_ioend: 15 callbacks suppressed
> > >  [  977.423982] dm-3: writeback error on inode 131, offset 1766125568, sector 4059696
> > >  [  977.423988] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
> > >  [  978.434124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > >  [  978.449156] XFS (dm-3): log I/O error -5
> > >  [  978.449159] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
> > >  [  978.449160] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
> > >  [  978.449163] XFS (dm-3): log I/O error -5
> > >  [  978.449164] XFS (dm-3): log I/O error -5
> > >  [  978.449165] XFS (dm-3): log I/O error -5
> > >  [  978.449166] XFS (dm-3): log I/O error -5
> > >  [  978.449167] XFS (dm-3): log I/O error -5
> > >  [  978.498053] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > >  [  978.513331] XFS (dm-3): Starting recovery (logdev: internal)
> > >  [  978.584227] XFS (dm-3): Ending recovery (logdev: internal)
> > >  [  978.587276] loop3: detected capacity change from 0 to 6876346
> > >  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > >  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
> > >  [  979.225078] XFS (loop3): Bad dir block magic!
> > >  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> > >  [  979.225120] ------------[ cut here ]------------
> > >  [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail+0x4e/0x68 [xfs]
> > >  [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
> > >  [  979.225373] CPU: 1 PID: 157902 Comm: mount Kdump: loaded Not tainted 6.7.0-rc6 #1
> > >  [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> > >  [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 [xfs])
> > >  [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > >  [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
> > >  [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
> > >  [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
> > >  [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
> > >  [  979.225476] Krnl Code: 000003ff7ffead6e: 95001010		cli	16(%r1),0
> > >  [  979.225476]            000003ff7ffead72: a774000a		brc	7,000003ff7ffead86
> > >  [  979.225476]           #000003ff7ffead76: af000000		mc	0,0
> > >  [  979.225476]           >000003ff7ffead7a: eb6ff0a80004	lmg	%r6,%r15,168(%r15)
> > >  [  979.225476]            000003ff7ffead80: 07fe		bcr	15,%r14
> > >  [  979.225476]            000003ff7ffead82: 47000700		bc	0,1792
> > >  [  979.225476]            000003ff7ffead86: af000000		mc	0,0
> > >  [  979.225476]            000003ff7ffead8a: 0707		bcr	0,%r7
> > >  [  979.225487] Call Trace:
> > >  [  979.225488]  [<000003ff7ffead7a>] assfail+0x52/0x68 [xfs] 
> > >  [  979.225568] ([<000003ff7ffead68>] assfail+0x40/0x68 [xfs])
> > >  [  979.225648]  [<000003ff80004b4e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
> > >  [  979.225727]  [<000003ff80005eca>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
> > >  [  979.225807]  [<000003ff8001077a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
> > >  [  979.225886]  [<000003ff80011436>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
> > >  [  979.225965]  [<000003ff80011598>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
> > >  [  979.226045]  [<000003ff80011660>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
> > >  [  979.226124]  [<000003ff80012126>] xlog_recover_process_data+0xb6/0x168 [xfs] 
> > >  [  979.226204]  [<000003ff800122dc>] xlog_recover_process+0x104/0x150 [xfs] 
> > >  [  979.226285]  [<000003ff800125a4>] xlog_do_recovery_pass+0x27c/0x748 [xfs] 
> > >  [  979.226366]  [<000003ff80012ec8>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
> > >  [  979.226446]  [<000003ff80012f64>] xlog_do_recover+0x4c/0x218 [xfs] 
> > >  [  979.226526]  [<000003ff8001446a>] xlog_recover+0xda/0x1a0 [xfs] 
> > >  [  979.226607]  [<000003ff7fffa51e>] xfs_log_mount+0x11e/0x280 [xfs] 
> > >  [  979.226686]  [<000003ff7ffec226>] xfs_mountfs+0x3e6/0x928 [xfs] 
> > >  [  979.226765]  [<000003ff7fff3b4c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
> > >  [  979.226844]  [<00000001ef53bfd4>] get_tree_bdev+0x144/0x1d0 
> > >  [  979.226850]  [<00000001ef539a08>] vfs_get_tree+0x38/0x110 
> > >  [  979.226852]  [<00000001ef56c432>] do_new_mount+0x17a/0x2d0 
> > >  [  979.226855]  [<00000001ef56d0ac>] path_mount+0x1ac/0x818 
> > >  [  979.226857]  [<00000001ef56d81c>] __s390x_sys_mount+0x104/0x148 
> > >  [  979.226860]  [<00000001efbb6e00>] __do_syscall+0x1d0/0x1f8 
> > >  [  979.226864]  [<00000001efbc7088>] system_call+0x70/0x98 
> > >  [  979.226868] Last Breaking-Event-Address:
> > >  [  979.226868]  [<000003ff7ffeabc4>] xfs_printk_level+0xac/0xd8 [xfs]
> > >  [  979.226949] ---[ end trace 0000000000000000 ]---
> > >  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
> > >  [  979.227732] XFS (loop3): Unmount and run xfs_repair
> > >  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> > >  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
> > >  [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
> > >  [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
> > >  [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
> > >  [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
> > >  [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
> > >  [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
> > >  [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
> > >  [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558).  Shutting down filesystem.
> > >  [  979.227826] XFS (loop3): Please unmount the filesystem and rectify the problem(s)
> > >  [  979.227912] XFS (loop3): log mount/recovery failed: error -117
> > >  [  979.228061] XFS (loop3): log mount failed
> > >  [  980.224124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > >  [  980.318601] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > >  [  980.353946] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > >  [  980.355207] XFS (loop0): Ending clean mount
> > >  [  982.585362] XFS (loop1): Mounting V5 Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> > >  [  982.586881] XFS (loop1): Ending clean mount
> > >  [  982.588944] XFS (loop1): Unmounting Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> > >  [  982.603160] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
> > >  [  982.620838] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > >  [  982.636206] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > >  [  982.637231] XFS (loop0): Ending clean mount
> > >  
> > > 
> > > 
> > 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2023-12-25 13:38     ` Zorro Lang
@ 2024-01-04  4:35       ` Darrick J. Wong
  2024-01-18  1:59         ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2024-01-04  4:35 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs, fstests

On Mon, Dec 25, 2023 at 09:38:54PM +0800, Zorro Lang wrote:
> On Tue, Dec 19, 2023 at 02:34:20PM +0800, Zorro Lang wrote:
> > On Mon, Dec 18, 2023 at 09:48:36AM -0800, Darrick J. Wong wrote:
> > > On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> > > > Hi,
> > > > 
> > > > Recently I hit a crash [1] on s390x with 64k directory block size xfs
> > > > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> > > > even not panic, a assertion failure will happen.
> > > > 
> > > > I found it from an old downstream kernel at first, then reproduced it
> > > > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> > > > time this issue be there, just reported it at first.
> > > > 
> > > > I hit and reproduced it several times by running generic/648 on s390x
> > > > with "-n size=65536" xfs many times. Not sure if s390x is needed, I'll
> > > > try to check other arches.
> > > 
> > > XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020
> > > 
> > > Where in the source code is this             ^^^^^^^^^^^^^^^^^^^^^^^^^^^?
> > 
> > The source is just based on HEAD="v6.7-rc6", the decode_stacktrace.sh output is:
> >   [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020
> > 
> > The code is as below:
> > 
> >         switch (hdr->magic) {
> >         case cpu_to_be32(XFS_DIR3_BLOCK_MAGIC):
> >         case cpu_to_be32(XFS_DIR2_BLOCK_MAGIC):
> >                 btp = xfs_dir2_block_tail_p(geo, hdr);
> >                 lep = xfs_dir2_block_leaf_p(btp);
> > 
> >                 if (be32_to_cpu(btp->count) >=
> >                     xfs_dir2_data_max_leaf_entries(geo))
> > 137: ==>                 return __this_address;
> >                 break;
> > 
> > 
> > More output as below:
> > 
> > [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> > [  979.225120] ------------[ cut here ]------------
> > [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail (fs/xfs/xfs_message.c:104 (discriminator 3)) xfs
> > [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm
> > +fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log
> > +dm_mod pkey zcrypt aes_s390
> > [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> > [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 xfs])
> > [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
> > [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
> > [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
> > [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
> > [ 979.225476] Krnl Code: 000003ff7ffead6e: 95001010 cli 16(%r1),0
> > All code
> > ========
> > 
> > Code starting with the faulting instruction
> > ===========================================
> > [  979.225476]            000003ff7ffead72: a774000a           brc     7,000003ff7ffead86
> > [  979.225476]           #000003ff7ffead76: af000000           mc      0,0
> > [  979.225476]           >000003ff7ffead7a: eb6ff0a80004       lmg     %r6,%r15,168(%r15)
> > [  979.225476]            000003ff7ffead80: 07fe               bcr     15,%r14
> > [  979.225476]            000003ff7ffead82: 47000700           bc      0,1792
> > [  979.225476]            000003ff7ffead86: af000000           mc      0,0
> > [  979.225476]            000003ff7ffead8a: 0707               bcr     0,%r7
> > [  979.225487] Call Trace:
> > [  979.225488] assfail (fs/xfs/xfs_message.c:105 (discriminator 3)) xfs
> > [  979.225568] assfail+0x40/0x68 xfs])
> > [  979.225648] xlog_recover_validate_buf_type (fs/xfs/xfs_buf_item_recover.c:414 (discriminator 1)) xfs
> > [  979.225727] xlog_recover_buf_commit_pass2 (fs/xfs/xfs_buf_item_recover.c:971) xfs
> > [  979.225807] xlog_recover_items_pass2 (fs/xfs/xfs_log_recover.c:1957) xfs
> > [  979.225886] xlog_recover_commit_trans (./include/linux/list.h:373 ./include/linux/list.h:588 fs/xfs/xfs_log_recover.c:2028) xfs
> > [  979.225965] xlog_recovery_process_trans (fs/xfs/xfs_log_recover.c:2257) xfs
> > [  979.226045] xlog_recover_process_ophdr (fs/xfs/xfs_log_recover.c:2405) xfs
> > [  979.226124] xlog_recover_process_data (fs/xfs/xfs_log_recover.c:2447) xfs
> > [  979.226204] xlog_recover_process (fs/xfs/xfs_log_recover.c:2901) xfs
> > [  979.226285] xlog_do_recovery_pass (fs/xfs/xfs_log_recover.c:3153) xfs
> > [  979.226366] xlog_do_log_recovery (fs/xfs/xfs_log_recover.c:3265) xfs
> > [  979.226446] xlog_do_recover (fs/xfs/xfs_log_recover.c:3293) xfs
> > [  979.226526] xlog_recover (./arch/s390/include/asm/atomic_ops.h:71 (discriminator 1) ./arch/s390/include/asm/bitops.h:63 (discriminator 1) ./include/asm-generic/bitops/instrumented-atomic.h:29 (discriminator 1) fs/xfs/xfs_log_recover.c:3427 (discriminator 1)) xfs
> > [  979.226607] xfs_log_mount (fs/xfs/xfs_log.c:718) xfs
> > [  979.226686] xfs_mountfs (fs/xfs/xfs_mount.c:822) xfs
> > [  979.226765] xfs_fs_fill_super (fs/xfs/xfs_super.c:1734) xfs
> > [  979.226844] get_tree_bdev (fs/super.c:1598) 
> > [  979.226850] vfs_get_tree (fs/super.c:1771) 
> > [  979.226852] do_new_mount (fs/namespace.c:3337) 
> > [  979.226855] path_mount (fs/namespace.c:3664) 
> > [  979.226857] __s390x_sys_mount (fs/namespace.c:3677 fs/namespace.c:3886 fs/namespace.c:3863 fs/namespace.c:3863) 
> > [  979.226860] __do_syscall (arch/s390/kernel/syscall.c:144 arch/s390/kernel/syscall.c:168) 
> > [  979.226864] system_call (arch/s390/kernel/entry.S:309) 
> > [  979.226868] Last Breaking-Event-Address:
> > [  979.226868] xfs_printk_level (fs/xfs/xfs_message.c:52) xfs
> > [  979.226949] ---[ end trace 0000000000000000 ]---
> > [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020
> > [  979.227732] XFS (loop3): Unmount and run xfs_repair
> > [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> > [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3...........
> > [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
> > [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
> > [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
> > [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
> > [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
> > [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
> > [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
> > [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558). Shutting down filesystem.
> > 
> > 
> > 
> > 
> > > 
> > > Also, does "xfs: update dir3 leaf block metadata after swap" fix it?
> > 
> > OK, I'll merge and give it a try.
> 
> It's still reproducible on xfs-linux for-next branch xfs-6.8-merge-2, which
> contains the 5759aa4f9560 ("xfs: update dir3 leaf block metadata after swap")

DOH.  Got a metadump?  I wonder if s390x is more fubar than we used to
think it was...

--D

> Thanks,
> Zorro
> 
> > 
> > Thanks,
> > Zorro
> > 
> > > 
> > > (I suspect it won't because that's a dirent block, but for-next has a
> > > few directory fixes in it.)
> > > 
> > > --D
> > > 
> > > > 
> > > > Thanks,
> > > > Zorro
> > > > 
> > > > [1]
> > > >  [  899.271221] run fstests generic/648 at 2023-12-18 01:18:03
> > > >  [  899.454444] XFS (loop1): Mounting V5 Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> > > >  [  899.455844] XFS (loop1): Ending clean mount
> > > >  [  899.463623] XFS (loop1): Unmounting Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> > > >  [  899.587365] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > > >  [  899.588501] XFS (dm-3): Ending clean mount
> > > >  ...
> > > >  ...
> > > >  [  974.217814] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > > >  [  974.803146] XFS (loop3): Starting recovery (logdev: internal)
> > > >  [  974.804634] XFS (loop3): Ending recovery (logdev: internal)
> > > >  [  975.782734] Direct I/O collision with buffered writes! File: /p2/df/f4f Comm: fsstress
> > > >  [  976.831942] lo_write_bvec: 15 callbacks suppressed
> > > >  [  976.831946] loop: Write error at byte offset 1766129664, length 4096.
> > > >  [  976.831953] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> > > >  [  976.831957] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> > > >  [  976.831960] XFS (loop3): log I/O error -5
> > > >  [  976.831963] XFS (loop3): Filesystem has been shut down due to log error (0x2).
> > > >  [  976.831964] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
> > > >  [  976.839603] buffer_io_error: 6 callbacks suppressed
> > > >  [  976.839604] Buffer I/O error on dev dm-3, logical block 20971392, async page read
> > > >  [  976.839607] Buffer I/O error on dev dm-3, logical block 20971393, async page read
> > > >  [  976.839609] Buffer I/O error on dev dm-3, logical block 20971394, async page read
> > > >  [  976.839611] Buffer I/O error on dev dm-3, logical block 20971395, async page read
> > > >  [  976.839612] Buffer I/O error on dev dm-3, logical block 20971396, async page read
> > > >  [  976.839613] Buffer I/O error on dev dm-3, logical block 20971397, async page read
> > > >  [  976.839614] Buffer I/O error on dev dm-3, logical block 20971398, async page read
> > > >  [  976.839616] Buffer I/O error on dev dm-3, logical block 20971399, async page read
> > > >  [  977.419266] XFS (loop3): Unmounting Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > > >  [  977.423981] iomap_finish_ioend: 15 callbacks suppressed
> > > >  [  977.423982] dm-3: writeback error on inode 131, offset 1766125568, sector 4059696
> > > >  [  977.423988] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
> > > >  [  978.434124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > > >  [  978.449156] XFS (dm-3): log I/O error -5
> > > >  [  978.449159] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
> > > >  [  978.449160] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
> > > >  [  978.449163] XFS (dm-3): log I/O error -5
> > > >  [  978.449164] XFS (dm-3): log I/O error -5
> > > >  [  978.449165] XFS (dm-3): log I/O error -5
> > > >  [  978.449166] XFS (dm-3): log I/O error -5
> > > >  [  978.449167] XFS (dm-3): log I/O error -5
> > > >  [  978.498053] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > > >  [  978.513331] XFS (dm-3): Starting recovery (logdev: internal)
> > > >  [  978.584227] XFS (dm-3): Ending recovery (logdev: internal)
> > > >  [  978.587276] loop3: detected capacity change from 0 to 6876346
> > > >  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > > >  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
> > > >  [  979.225078] XFS (loop3): Bad dir block magic!
> > > >  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> > > >  [  979.225120] ------------[ cut here ]------------
> > > >  [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail+0x4e/0x68 [xfs]
> > > >  [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
> > > >  [  979.225373] CPU: 1 PID: 157902 Comm: mount Kdump: loaded Not tainted 6.7.0-rc6 #1
> > > >  [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> > > >  [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 [xfs])
> > > >  [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > > >  [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
> > > >  [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
> > > >  [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
> > > >  [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
> > > >  [  979.225476] Krnl Code: 000003ff7ffead6e: 95001010		cli	16(%r1),0
> > > >  [  979.225476]            000003ff7ffead72: a774000a		brc	7,000003ff7ffead86
> > > >  [  979.225476]           #000003ff7ffead76: af000000		mc	0,0
> > > >  [  979.225476]           >000003ff7ffead7a: eb6ff0a80004	lmg	%r6,%r15,168(%r15)
> > > >  [  979.225476]            000003ff7ffead80: 07fe		bcr	15,%r14
> > > >  [  979.225476]            000003ff7ffead82: 47000700		bc	0,1792
> > > >  [  979.225476]            000003ff7ffead86: af000000		mc	0,0
> > > >  [  979.225476]            000003ff7ffead8a: 0707		bcr	0,%r7
> > > >  [  979.225487] Call Trace:
> > > >  [  979.225488]  [<000003ff7ffead7a>] assfail+0x52/0x68 [xfs] 
> > > >  [  979.225568] ([<000003ff7ffead68>] assfail+0x40/0x68 [xfs])
> > > >  [  979.225648]  [<000003ff80004b4e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
> > > >  [  979.225727]  [<000003ff80005eca>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
> > > >  [  979.225807]  [<000003ff8001077a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
> > > >  [  979.225886]  [<000003ff80011436>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
> > > >  [  979.225965]  [<000003ff80011598>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
> > > >  [  979.226045]  [<000003ff80011660>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
> > > >  [  979.226124]  [<000003ff80012126>] xlog_recover_process_data+0xb6/0x168 [xfs] 
> > > >  [  979.226204]  [<000003ff800122dc>] xlog_recover_process+0x104/0x150 [xfs] 
> > > >  [  979.226285]  [<000003ff800125a4>] xlog_do_recovery_pass+0x27c/0x748 [xfs] 
> > > >  [  979.226366]  [<000003ff80012ec8>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
> > > >  [  979.226446]  [<000003ff80012f64>] xlog_do_recover+0x4c/0x218 [xfs] 
> > > >  [  979.226526]  [<000003ff8001446a>] xlog_recover+0xda/0x1a0 [xfs] 
> > > >  [  979.226607]  [<000003ff7fffa51e>] xfs_log_mount+0x11e/0x280 [xfs] 
> > > >  [  979.226686]  [<000003ff7ffec226>] xfs_mountfs+0x3e6/0x928 [xfs] 
> > > >  [  979.226765]  [<000003ff7fff3b4c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
> > > >  [  979.226844]  [<00000001ef53bfd4>] get_tree_bdev+0x144/0x1d0 
> > > >  [  979.226850]  [<00000001ef539a08>] vfs_get_tree+0x38/0x110 
> > > >  [  979.226852]  [<00000001ef56c432>] do_new_mount+0x17a/0x2d0 
> > > >  [  979.226855]  [<00000001ef56d0ac>] path_mount+0x1ac/0x818 
> > > >  [  979.226857]  [<00000001ef56d81c>] __s390x_sys_mount+0x104/0x148 
> > > >  [  979.226860]  [<00000001efbb6e00>] __do_syscall+0x1d0/0x1f8 
> > > >  [  979.226864]  [<00000001efbc7088>] system_call+0x70/0x98 
> > > >  [  979.226868] Last Breaking-Event-Address:
> > > >  [  979.226868]  [<000003ff7ffeabc4>] xfs_printk_level+0xac/0xd8 [xfs]
> > > >  [  979.226949] ---[ end trace 0000000000000000 ]---
> > > >  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
> > > >  [  979.227732] XFS (loop3): Unmount and run xfs_repair
> > > >  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> > > >  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
> > > >  [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
> > > >  [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
> > > >  [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
> > > >  [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
> > > >  [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
> > > >  [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
> > > >  [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
> > > >  [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558).  Shutting down filesystem.
> > > >  [  979.227826] XFS (loop3): Please unmount the filesystem and rectify the problem(s)
> > > >  [  979.227912] XFS (loop3): log mount/recovery failed: error -117
> > > >  [  979.228061] XFS (loop3): log mount failed
> > > >  [  980.224124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > > >  [  980.318601] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > > >  [  980.353946] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > > >  [  980.355207] XFS (loop0): Ending clean mount
> > > >  [  982.585362] XFS (loop1): Mounting V5 Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> > > >  [  982.586881] XFS (loop1): Ending clean mount
> > > >  [  982.588944] XFS (loop1): Unmounting Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> > > >  [  982.603160] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
> > > >  [  982.620838] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > > >  [  982.636206] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > > >  [  982.637231] XFS (loop0): Ending clean mount
> > > >  
> > > > 
> > > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-04  4:35       ` Darrick J. Wong
@ 2024-01-18  1:59         ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2024-01-18  1:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Zorro Lang, linux-xfs, fstests

[ FYI, I missed this because I got the fstests list copy first, not
linux-xfs list copy and so it got filtered into fstests, not XFS.
please just send test failures like this to the linux-xfs list -
there is no value in sending them to fstests as well but it can
cause bug reports to "go missing". ]

On Wed, Jan 03, 2024 at 08:35:52PM -0800, Darrick J. Wong wrote:
> On Mon, Dec 25, 2023 at 09:38:54PM +0800, Zorro Lang wrote:
> > On Tue, Dec 19, 2023 at 02:34:20PM +0800, Zorro Lang wrote:
> > > > Also, does "xfs: update dir3 leaf block metadata after swap" fix it?
> > > 
> > > OK, I'll merge and give it a try.
> > 
> > It's still reproducible on xfs-linux for-next branch xfs-6.8-merge-2, which
> > contains the 5759aa4f9560 ("xfs: update dir3 leaf block metadata after swap")
> 
> DOH.  Got a metadump?  I wonder if s390x is more fubar than we used to
> think it was...

I'm betting that the directory corruption is being reported because
the directory block change was not replayed by recovery due to the
bad magic number error. The kernel was configured with
XFS_ASSERT_FATAL=n, so it continued on after the recovery failure
and things went bad when that unrecovered metadata was accessed.

Two things: a magic number mismatch between the buffer and the log
item should cause recovery to fail on production kernels, and what
we need to work out is how the buffer being recovered had a magic
number mismatch with the buf log item.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2023-12-18 14:01 [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply Zorro Lang
  2023-12-18 17:48 ` Darrick J. Wong
@ 2024-01-18  4:20 ` Dave Chinner
  1 sibling, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2024-01-18  4:20 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs, fstests

On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> Hi,
> 
> Recently I hit a crash [1] on s390x with 64k directory block size xfs
> (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> even not panic, a assertion failure will happen.
> 
> I found it from an old downstream kernel at first, then reproduced it
> on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> time this issue be there, just reported it at first.
>  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
>  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
>  [  979.225078] XFS (loop3): Bad dir block magic!
>  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414

Ok, so we got a XFS_BLFT_DIR_BLOCK_BUF buf log item, but the object
that we recovered into the buffer did not have a
XFS_DIR3_BLOCK_MAGIC type.

Perhaps the buf log item didn't contain the first 128 bytes of the
buffer (or maybe any of it), and so didn't recovery the magic number?

Can you reproduce this with CONFIG_XFS_ASSERT_FATAL=y so the failure
preserves the journal contents when the issue triggers, then get a
metadump of the filesystem so I can dig into the contents of the
journal?  I really want to see what is in the buf log item we fail
to recover.

We don't want recovery to continue here because that will result in
the journal being fully recovered and updated and so we won't be
able to replay the recovery failure from it. 

i.e. if we leave the buffer we recovered in memory without failure
because the ASSERT is just a warn, we continue onwards and likely
then recover newer changes over the top of it. This may or may
not result in a correctly recovered buffer, depending on what parts
of the buffer got relogged.

IOWs, we should be expecting corruption to be detected somewhere
further down the track once we've seen this warning, and really we
should be aborting journal recovery if we see a mismatch like this.

.....

>  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
>  [  979.227732] XFS (loop3): Unmount and run xfs_repair
>  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
>  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 

XDB3 is XFS_DIR3_BLOCK_MAGIC, so it's the right type, but given it's
the tail pointer (btp->count) that is bad, this indicates that maybe
the tail didn't get written correctly by subsequent checkpoint
recoveries. We don't know, because that isn't in the output below.

It likely doesn't matter, because I think the problem is either a
runtime problem writing bad stuff into the journal, or a recovery
problem failing to handle the contents correctly. Hence the need for
a metadump.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-01-18  4:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-18 14:01 [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply Zorro Lang
2023-12-18 17:48 ` Darrick J. Wong
2023-12-19  6:34   ` Zorro Lang
2023-12-25 13:38     ` Zorro Lang
2024-01-04  4:35       ` Darrick J. Wong
2024-01-18  1:59         ` Dave Chinner
2024-01-18  4:20 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox