* [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs]
@ 2023-10-29 4:11 Zorro Lang
2023-11-06 6:13 ` Dave Chinner
0 siblings, 1 reply; 19+ messages in thread
From: Zorro Lang @ 2023-10-29 4:11 UTC (permalink / raw)
To: linux-xfs; +Cc: fstests
Hi xfs list,
Recently I always hit xfs corruption by running fstests generic/047 [1], and
it show more failures in dmesg[2], e.g:
XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs]
The .full output lots of curruption messages [3].
Currently I only hit this issue on s390x, haven't hit it on x86_64 or ppc64le
or aarch64. The mkfs option isn't related, even on overlayfs (base on xfs)
still hit this issue.
I tested on mainline linux 6.6.0-rc7+, the HEAD commit is:
commit 750b95887e567848ac2c851dae47922cac6db946
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu Oct 26 20:42:02 2023 -1000
Merge tag 'drm-fixes-2023-10-27' of git://anongit.freedesktop.org/drm/drm
Thanks,
Zorro
[1]
FSTYP -- xfs (debug)
PLATFORM -- Linux/s390x s390x-kvm-091 6.6.0-rc7+ #1 SMP Fri Oct 27 08:58:03 EDT 2023
MKFS_OPTIONS -- -f -m crc=1,finobt=1,rmapbt=0,reflink=1,bigtime=1,inobtcount=1 /dev/loop1
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/fstests/SCRATCH_DIR
generic/047 _check_xfs_filesystem: filesystem on /dev/loop1 failed scrub
(see /var/lib/xfstests/results//generic/047.full for details)
_check_xfs_filesystem: filesystem on /dev/loop1 is inconsistent (r)
(see /var/lib/xfstests/results//generic/047.full for details)
_check_dmesg: something found in dmesg (see /var/lib/xfstests/results//generic/047.dmesg)
- output mismatch (see /var/lib/xfstests/results//generic/047.out.bad)
--- tests/generic/047.out 2023-10-27 09:04:38.145351816 -0400
+++ /var/lib/xfstests/results//generic/047.out.bad 2023-10-27 09:08:12.475381462 -0400
@@ -1 +1,1000 @@
QA output created by 047
+file /mnt/fstests/SCRATCH_DIR/1 missing - fsync failed
+file /mnt/fstests/SCRATCH_DIR/2 missing - fsync failed
+file /mnt/fstests/SCRATCH_DIR/3 missing - fsync failed
+file /mnt/fstests/SCRATCH_DIR/4 missing - fsync failed
+file /mnt/fstests/SCRATCH_DIR/5 missing - fsync failed
+file /mnt/fstests/SCRATCH_DIR/6 missing - fsync failed
...
(Run 'diff -u /var/lib/xfstests/tests/generic/047.out /var/lib/xfstests/results//generic/047.out.bad' to see the entire diff)
Ran: generic/047
Failures: generic/047
Failed 1 of 1 tests
[2]
[ 376.468885] run fstests generic/047 at 2023-10-27 09:08:07
[ 376.675751] XFS (loop1): Mounting V5 Filesystem 716c9687-ee74-4c12-b6ad-a0b513194f2b
[ 376.677088] XFS (loop1): Ending clean mount
[ 376.678189] XFS (loop1): User initiated shutdown received.
[ 376.678194] XFS (loop1): Metadata I/O Error (0x4) detected at xfs_fs_goingdown+0x5a/0xf8 [xfs] (fs/xfs/xfs_fsops.c:492). Shutting down filesystem.
[ 376.678409] XFS (loop1): Please unmount the filesystem and rectify the problem(s)
[ 376.679423] XFS (loop1): Unmounting Filesystem 716c9687-ee74-4c12-b6ad-a0b513194f2b
[ 376.714910] XFS (loop1): Mounting V5 Filesystem 40196bb2-39f4-4c32-83ef-567f42216699
[ 376.716353] XFS (loop1): Ending clean mount
[ 380.375878] XFS (loop1): User initiated shutdown received.
[ 380.375888] XFS (loop1): Log I/O Error (0x6) detected at xfs_fs_goingdown+0xb4/0xf8 [xfs] (fs/xfs/xfs_fsops.c:495). Shutting down filesystem.
[ 380.376101] XFS (loop1): Please unmount the filesystem and rectify the problem(s)
[ 380.380373] XFS (loop1): Unmounting Filesystem 40196bb2-39f4-4c32-83ef-567f42216699
[ 380.383835] XFS (loop1): Mounting V5 Filesystem 40196bb2-39f4-4c32-83ef-567f42216699
[ 380.397086] XFS (loop1): Starting recovery (logdev: internal)
[ 380.465934] XFS (loop1): Ending recovery (logdev: internal)
[ 380.467409] XFS (loop1): Unmounting Filesystem 40196bb2-39f4-4c32-83ef-567f42216699
[ 380.475431] XFS (loop1): Mounting V5 Filesystem 40196bb2-39f4-4c32-83ef-567f42216699
[ 380.477235] XFS (loop1): Ending clean mount
[ 380.477500] XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs]
[ 380.477636] CPU: 0 PID: 337362 Comm: 047 Kdump: loaded Tainted: G W 6.6.0-rc7+ #1
[ 380.477639] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[ 380.477641] Call Trace:
[ 380.477642] [<0000000032d71372>] dump_stack_lvl+0x62/0x80
[ 380.477648] [<000003ff7ff96c00>] xfs_corruption_error+0x70/0xa0 [xfs]
[ 380.477762] [<000003ff7ff551ce>] xfs_dabuf_map.constprop.0+0x2a6/0x368 [xfs]
[ 380.477871] [<000003ff7ff5773e>] xfs_da_read_buf+0x6e/0x128 [xfs]
[ 380.477977] [<000003ff7ff57838>] xfs_da3_node_read+0x40/0x78 [xfs]
[ 380.478085] [<000003ff7ff58c7a>] xfs_da3_node_lookup_int+0x82/0x558 [xfs]
[ 380.478193] [<000003ff7ff68d6e>] xfs_dir2_node_lookup+0x3e/0x140 [xfs]
[ 380.478302] [<000003ff7ff5cfd2>] xfs_dir_lookup+0x1ea/0x218 [xfs]
[ 380.478410] [<000003ff7fface40>] xfs_lookup+0x60/0x108 [xfs]
[ 380.478525] [<000003ff7ffa9d42>] xfs_vn_lookup+0x62/0x98 [xfs]
[ 380.478641] [<0000000032738cfa>] __lookup_slow+0x9a/0x148
[ 380.478647] [<000000003273dc86>] walk_component+0x126/0x1b8
[ 380.478650] [<000000003273e948>] path_lookupat+0x88/0x1e8
[ 380.478653] [<000000003273f66a>] filename_lookup+0xaa/0x198
[ 380.478656] [<00000000327306d0>] vfs_statx+0x90/0x160
[ 380.478659] [<0000000032730a1e>] vfs_fstatat+0x86/0xe8
[ 380.478661] [<0000000032730c98>] __do_sys_newfstatat+0x28/0x48
[ 380.478663] [<0000000032d97060>] __do_syscall+0x1d0/0x1f8
[ 380.478668] [<0000000032da7040>] system_call+0x70/0x98
[ 380.478672] XFS (loop1): Corruption detected. Unmount and run xfs_repair
[ 380.478674] XFS (loop1): xfs_dabuf_map: bno 8388608 inode 128
[ 380.478676] XFS (loop1): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0
[ 380.478711] XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs]
[ 380.478818] CPU: 0 PID: 337362 Comm: 047 Kdump: loaded Tainted: G W 6.6.0-rc7+ #1
[ 380.478820] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[ 380.478821] Call Trace:
[ 380.478822] [<0000000032d71372>] dump_stack_lvl+0x62/0x80
[ 380.478825] [<000003ff7ff96c00>] xfs_corruption_error+0x70/0xa0 [xfs]
[ 380.478939] [<000003ff7ff551ce>] xfs_dabuf_map.constprop.0+0x2a6/0x368 [xfs]
[ 380.479047] [<000003ff7ff5773e>] xfs_da_read_buf+0x6e/0x128 [xfs]
[ 380.479155] [<000003ff7ff57838>] xfs_da3_node_read+0x40/0x78 [xfs]
[ 380.479263] [<000003ff7ff58c7a>] xfs_da3_node_lookup_int+0x82/0x558 [xfs]
[ 380.479370] [<000003ff7ff68d6e>] xfs_dir2_node_lookup+0x3e/0x140 [xfs]
[ 380.479480] [<000003ff7ff5cfd2>] xfs_dir_lookup+0x1ea/0x218 [xfs]
[ 380.479594] [<000003ff7fface40>] xfs_lookup+0x60/0x108 [xfs]
[ 380.479710] [<000003ff7ffa9d42>] xfs_vn_lookup+0x62/0x98 [xfs]
[ 380.479826] [<0000000032738cfa>] __lookup_slow+0x9a/0x148
[ 380.479829] [<000000003273dc86>] walk_component+0x126/0x1b8
[ 380.479832] [<000000003273e948>] path_lookupat+0x88/0x1e8
[ 380.479835] [<000000003273f66a>] filename_lookup+0xaa/0x198
[ 380.479837] [<00000000327306d0>] vfs_statx+0x90/0x160
[ 380.479840] [<0000000032730a1e>] vfs_fstatat+0x86/0xe8
[ 380.479842] [<0000000032730c98>] __do_sys_newfstatat+0x28/0x48
[ 380.479844] [<0000000032d97060>] __do_syscall+0x1d0/0x1f8
[ 380.479847] [<0000000032da7040>] system_call+0x70/0x98
[ 380.479850] XFS (loop1): Corruption detected. Unmount and run xfs_repair
[ 380.479852] XFS (loop1): xfs_dabuf_map: bno 8388608 inode 128
[ 380.479887] XFS (loop1): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0
[ 380.479912] XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs]
[ 380.480022] CPU: 0 PID: 337362 Comm: 047 Kdump: loaded Tainted: G W 6.6.0-rc7+ #1
[ 380.480024] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[ 380.480026] Call Trace:
[ 380.480026] [<0000000032d71372>] dump_stack_lvl+0x62/0x80
[ 380.480029] [<000003ff7ff96c00>] xfs_corruption_error+0x70/0xa0 [xfs]
[ 380.480143] [<000003ff7ff551ce>] xfs_dabuf_map.constprop.0+0x2a6/0x368 [xfs]
[ 380.480253] [<000003ff7ff5773e>] xfs_da_read_buf+0x6e/0x128 [xfs]
[ 380.480360] [<000003ff7ff57838>] xfs_da3_node_read+0x40/0x78 [xfs]
[ 380.480467] [<000003ff7ff58c7a>] xfs_da3_node_lookup_int+0x82/0x558 [xfs]
[ 380.480576] [<000003ff7ff68d6e>] xfs_dir2_node_lookup+0x3e/0x140 [xfs]
[ 380.480683] [<000003ff7ff5cfd2>] xfs_dir_lookup+0x1ea/0x218 [xfs]
[ 380.480791] [<000003ff7fface40>] xfs_lookup+0x60/0x108 [xfs]
[ 380.480907] [<000003ff7ffa9d42>] xfs_vn_lookup+0x62/0x98 [xfs]
[ 380.481021] [<0000000032738cfa>] __lookup_slow+0x9a/0x148
[ 380.481024] [<000000003273dc86>] walk_component+0x126/0x1b8
[ 380.481027] [<000000003273e948>] path_lookupat+0x88/0x1e8
[ 380.481030] [<000000003273f66a>] filename_lookup+0xaa/0x198
[ 380.481033] [<00000000327306d0>] vfs_statx+0x90/0x160
[ 380.481036] [<0000000032730a1e>] vfs_fstatat+0x86/0xe8
[ 380.481038] [<0000000032730c98>] __do_sys_newfstatat+0x28/0x48
[ 380.481040] [<0000000032d97060>] __do_syscall+0x1d0/0x1f8
[ 380.481043] [<0000000032da7040>] system_call+0x70/0x98
[ 380.481051] XFS (loop1): Corruption detected. Unmount and run xfs_repair
[ 380.481053] XFS (loop1): xfs_dabuf_map: bno 8388608 inode 128
[ 380.481054] XFS (loop1): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0
[ 380.481080] XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs]
[ 380.481189] CPU: 0 PID: 337362 Comm: 047 Kdump: loaded Tainted: G W 6.6.0-rc7+ #1
[ 380.481191] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[ 380.481192] Call Trace:
[ 380.481193] [<0000000032d71372>] dump_stack_lvl+0x62/0x80
[ 380.481195] [<000003ff7ff96c00>] xfs_corruption_error+0x70/0xa0 [xfs]
[ 380.481310] [<000003ff7ff551ce>] xfs_dabuf_map.constprop.0+0x2a6/0x368 [xfs]
[ 380.481418] [<000003ff7ff5773e>] xfs_da_read_buf+0x6e/0x128 [xfs]
[ 380.481526] [<000003ff7ff57838>] xfs_da3_node_read+0x40/0x78 [xfs]
[ 380.481635] [<000003ff7ff58c7a>] xfs_da3_node_lookup_int+0x82/0x558 [xfs]
[ 380.481743] [<000003ff7ff68d6e>] xfs_dir2_node_lookup+0x3e/0x140 [xfs]
[ 380.481854] [<000003ff7ff5cfd2>] xfs_dir_lookup+0x1ea/0x218 [xfs]
[ 380.481961] [<000003ff7fface40>] xfs_lookup+0x60/0x108 [xfs]
[ 380.482076] [<000003ff7ffa9d42>] xfs_vn_lookup+0x62/0x98 [xfs]
[ 380.482191] [<0000000032738cfa>] __lookup_slow+0x9a/0x148
[ 380.482194] [<000000003273dc86>] walk_component+0x126/0x1b8
[ 380.482197] [<000000003273e948>] path_lookupat+0x88/0x1e8
[ 380.482200] [<000000003273f66a>] filename_lookup+0xaa/0x198
[ 380.482203] [<00000000327306d0>] vfs_statx+0x90/0x160
[ 380.482205] [<0000000032730a1e>] vfs_fstatat+0x86/0xe8
[ 380.482207] [<0000000032730c98>] __do_sys_newfstatat+0x28/0x48
[ 380.482209] [<0000000032d97060>] __do_syscall+0x1d0/0x1f8
[ 380.482212] [<0000000032da7040>] system_call+0x70/0x98
[ 380.482215] XFS (loop1): Corruption detected. Unmount and run xfs_repair
[ 380.482216] XFS (loop1): xfs_dabuf_map: bno 8388608 inode 128
[ 380.482218] XFS (loop1): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0
[ 380.482242] XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs]
[ 380.482362] CPU: 0 PID: 337362 Comm: 047 Kdump: loaded Tainted: G W 6.6.0-rc7+ #1
[ 380.482364] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[ 380.482365] Call Trace:
[ 380.482366] [<0000000032d71372>] dump_stack_lvl+0x62/0x80
[ 380.482368] [<000003ff7ff96c00>] xfs_corruption_error+0x70/0xa0 [xfs]
[ 380.482483] [<000003ff7ff551ce>] xfs_dabuf_map.constprop.0+0x2a6/0x368 [xfs]
[ 380.482591] [<000003ff7ff5773e>] xfs_da_read_buf+0x6e/0x128 [xfs]
[ 380.482696] [<000003ff7ff57838>] xfs_da3_node_read+0x40/0x78 [xfs]
[ 380.482804] [<000003ff7ff58c7a>] xfs_da3_node_lookup_int+0x82/0x558 [xfs]
[ 380.482910] [<000003ff7ff68d6e>] xfs_dir2_node_lookup+0x3e/0x140 [xfs]
[ 380.483020] [<000003ff7ff5cfd2>] xfs_dir_lookup+0x1ea/0x218 [xfs]
[ 380.483128] [<000003ff7fface40>] xfs_lookup+0x60/0x108 [xfs]
[ 380.483241] [<000003ff7ffa9d42>] xfs_vn_lookup+0x62/0x98 [xfs]
[ 380.483356] [<0000000032738cfa>] __lookup_slow+0x9a/0x148
[ 380.483359] [<000000003273dc86>] walk_component+0x126/0x1b8
[ 380.483362] [<000000003273e948>] path_lookupat+0x88/0x1e8
[ 380.483365] [<000000003273f66a>] filename_lookup+0xaa/0x198
[ 380.483368] [<00000000327306d0>] vfs_statx+0x90/0x160
[ 380.483370] [<0000000032730a1e>] vfs_fstatat+0x86/0xe8
[ 380.483372] [<0000000032730c98>] __do_sys_newfstatat+0x28/0x48
[ 380.483374] [<0000000032d97060>] __do_syscall+0x1d0/0x1f8
[ 380.483377] [<0000000032da7040>] system_call+0x70/0x98
[ 380.483380] XFS (loop1): Corruption detected. Unmount and run xfs_repair
[ 380.483382] XFS (loop1): xfs_dabuf_map: bno 8388608 inode 128
[ 380.483383] XFS (loop1): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0
...
...
[ 381.349174] XFS (loop1): Corruption detected. Unmount and run xfs_repair
[ 381.349175] XFS (loop1): xfs_dabuf_map: bno 8388608 inode 128
[ 381.349177] XFS (loop1): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0
[ 381.350972] XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs]
[ 381.351085] CPU: 1 PID: 339644 Comm: xfs_scrub Kdump: loaded Tainted: G W 6.6.0-rc7+ #1
[ 381.351088] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[ 381.351090] Call Trace:
[ 381.351091] [<0000000032d71372>] dump_stack_lvl+0x62/0x80
[ 381.351093] [<000003ff7ff96c00>] xfs_corruption_error+0x70/0xa0 [xfs]
[ 381.351191] [<000003ff7ff551ce>] xfs_dabuf_map.constprop.0+0x2a6/0x368 [xfs]
[ 381.351283] [<000003ff7ff5773e>] xfs_da_read_buf+0x6e/0x128 [xfs]
[ 381.351375] [<000003ff7ff57838>] xfs_da3_node_read+0x40/0x78 [xfs]
[ 381.351467] [<000003ff7ff58c7a>] xfs_da3_node_lookup_int+0x82/0x558 [xfs]
[ 381.351558] [<000003ff7ff68d6e>] xfs_dir2_node_lookup+0x3e/0x140 [xfs]
[ 381.351651] [<000003ff80008436>] xchk_dir_lookup+0x13e/0x1e0 [xfs]
[ 381.351745] [<000003ff800078c2>] xchk_parent+0x82/0x168 [xfs]
[ 381.351840] [<000003ff8000aa1a>] xfs_scrub_metadata+0x1c2/0x420 [xfs]
[ 381.351938] [<000003ff7ffa3876>] xfs_ioc_scrub_metadata+0x5e/0xb0 [xfs]
[ 381.352035] [<000003ff7ffa5d92>] xfs_file_ioctl+0x672/0x9f8 [xfs]
[ 381.352133] [<0000000032744c9e>] __s390x_sys_ioctl+0xbe/0x100
[ 381.352135] [<0000000032d97060>] __do_syscall+0x1d0/0x1f8
[ 381.352138] [<0000000032da7040>] system_call+0x70/0x98
[ 381.352141] XFS (loop1): Corruption detected. Unmount and run xfs_repair
[ 381.352143] XFS (loop1): xfs_dabuf_map: bno 8388608 inode 128
[ 381.352174] XFS (loop1): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0
[ 381.418153] XFS (loop1): Unmounting Filesystem 40196bb2-39f4-4c32-83ef-567f42216699
[3]
_check_xfs_filesystem: filesystem on /dev/loop1 failed scrub
*** xfs_scrub -v -d -n output ***
EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
Phase 1: Find filesystem geometry.
/mnt/fstests/SCRATCH_DIR: using 2 threads to scrub.
Phase 2: Check internal metadata.
Info: AG 1 superblock: Optimization is possible. (scrub.c line 212)
Info: AG 2 superblock: Optimization is possible. (scrub.c line 212)
Info: AG 3 superblock: Optimization is possible. (scrub.c line 212)
Phase 3: Scan all inodes.
Corruption: inode 128 (0/128) inode record: Repairs are required. (scrub.c line 196)
Corruption: inode 128 (0/128) parent pointer: Repairs are required. (scrub.c line 196)
Corruption: inode 131 (0/131) inode record: Repairs are required. (scrub.c line 196)
...
Corruption: inode 58300 (0/58300) inode record: Repairs are required. (scrub.c line 196)
Corruption: inode 58301 (0/58301) inode record: Repairs are required. (scrub.c line 196)
Corruption: inode 58302 (0/58302) inode record: Repairs are required. (scrub.c line 196)
Corruption: inode 58303 (0/58303) inode record: Repairs are required. (scrub.c line 196)
Phase 5: Check directory tree.
Info: /mnt/fstests/SCRATCH_DIR: Filesystem has errors, skipping connectivity checks. (phase5.c line 393)
Phase 7: Check summary counters.
198.9MiB data used; 1.0K inodes used.
95.9MiB data found; 1.0K inodes found.
1.0K inodes counted; 1.0K inodes checked.
/mnt/fstests/SCRATCH_DIR: corruptions found: 1001
/mnt/fstests/SCRATCH_DIR: Re-run xfs_scrub without -n.
*** end xfs_scrub output
_check_xfs_filesystem: filesystem on /dev/loop1 is inconsistent (r)
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
bad nblocks 9 for inode 128, would reset to 0
no . entry for directory 128
no .. entry for root directory 128
problem with directory contents in inode 128
would clear root inode 128
bad nblocks 8 for inode 131, would reset to 0
bad nblocks 8 for inode 132, would reset to 0
bad nblocks 8 for inode 133, would reset to 0
...
bad nblocks 8 for inode 62438, would reset to 0
bad nblocks 8 for inode 62439, would reset to 0
bad nblocks 8 for inode 62440, would reset to 0
bad nblocks 8 for inode 62441, would reset to 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
root inode would be lost
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
bad nblocks 9 for inode 128, would reset to 0
- agno = 2
- agno = 3
no . entry for directory 128
no .. entry for root directory 128
problem with directory contents in inode 128
would clear root inode 128
bad nblocks 8 for inode 131, would reset to 0
bad nblocks 8 for inode 132, would reset to 0
bad nblocks 8 for inode 133, would reset to 0
...
bad nblocks 8 for inode 62439, would reset to 0
bad nblocks 8 for inode 62440, would reset to 0
bad nblocks 8 for inode 62441, would reset to 0
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
would reinitialize root directory
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 131, would move to lost+found
disconnected inode 132, would move to lost+found
disconnected inode 133, would move to lost+found
...
disconnected inode 62439, would move to lost+found
disconnected inode 62440, would move to lost+found
disconnected inode 62441, would move to lost+found
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
*** end xfs_repair output
*** mount output ***
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=4096k,nr_inodes=986186,mode=755,inode64)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,size=1592372k,nr_inodes=819200,mode=755,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime,seclabel)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
/dev/mapper/rhel_s390x--kvm--091-root on / type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,nosuid,noexec,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=3937)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime,seclabel)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime,seclabel)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime,seclabel)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel,pagesize=1M)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
ramfs on /run/credentials/systemd-sysctl.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
ramfs on /run/credentials/systemd-tmpfiles-setup-dev.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
/dev/vda1 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)
/dev/mapper/rhel_s390x--kvm--091-home on /home type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)
ramfs on /run/credentials/systemd-tmpfiles-setup.service type ramfs (ro,nosuid,nodev,noexec,relatime,seclabel,mode=700)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=796184k,nr_inodes=199046,mode=700,inode64)
*** end mount output
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-10-29 4:11 [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] Zorro Lang @ 2023-11-06 6:13 ` Dave Chinner 2023-11-06 19:26 ` Zorro Lang 0 siblings, 1 reply; 19+ messages in thread From: Dave Chinner @ 2023-11-06 6:13 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-xfs, fstests On Sun, Oct 29, 2023 at 12:11:22PM +0800, Zorro Lang wrote: > Hi xfs list, > > Recently I always hit xfs corruption by running fstests generic/047 [1], and > it show more failures in dmesg[2], e.g: OK, g/047 is an fsync test. > > XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] Ok, a directory block index translated to a hole in the file mapping. That's bad... > [2] > [ 376.468885] run fstests generic/047 at 2023-10-27 09:08:07 > [ 376.675751] XFS (loop1): Mounting V5 Filesystem 716c9687-ee74-4c12-b6ad-a0b513194f2b > [ 376.677088] XFS (loop1): Ending clean mount > [ 376.678189] XFS (loop1): User initiated shutdown received. > [ 376.678194] XFS (loop1): Metadata I/O Error (0x4) detected at xfs_fs_goingdown+0x5a/0xf8 [xfs] (fs/xfs/xfs_fsops.c:492). Shutting down filesystem. > [ 376.678409] XFS (loop1): Please unmount the filesystem and rectify the problem(s) > [ 376.679423] XFS (loop1): Unmounting Filesystem 716c9687-ee74-4c12-b6ad-a0b513194f2b > [ 376.714910] XFS (loop1): Mounting V5 Filesystem 40196bb2-39f4-4c32-83ef-567f42216699 > [ 376.716353] XFS (loop1): Ending clean mount Files are created and fsync'd here. > [ 380.375878] XFS (loop1): User initiated shutdown received. > [ 380.375888] XFS (loop1): Log I/O Error (0x6) detected at xfs_fs_goingdown+0xb4/0xf8 [xfs] (fs/xfs/xfs_fsops.c:495). Shutting down filesystem. Then the fs is shut down. > [ 380.376101] XFS (loop1): Please unmount the filesystem and rectify the problem(s) > [ 380.380373] XFS (loop1): Unmounting Filesystem 40196bb2-39f4-4c32-83ef-567f42216699 > [ 380.383835] XFS (loop1): Mounting V5 Filesystem 40196bb2-39f4-4c32-83ef-567f42216699 > [ 380.397086] XFS (loop1): Starting recovery (logdev: internal) > [ 380.465934] XFS (loop1): Ending recovery (logdev: internal) Then it is recovered.... > [ 380.467409] XFS (loop1): Unmounting Filesystem 40196bb2-39f4-4c32-83ef-567f42216699 > [ 380.475431] XFS (loop1): Mounting V5 Filesystem 40196bb2-39f4-4c32-83ef-567f42216699 > [ 380.477235] XFS (loop1): Ending clean mount > [ 380.477500] XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] .... and now the directory is bad. > [ 380.477636] CPU: 0 PID: 337362 Comm: 047 Kdump: loaded Tainted: G W 6.6.0-rc7+ #1 > [ 380.477639] Hardware name: IBM 3931 LA1 400 (KVM/Linux) > [ 380.477641] Call Trace: > [ 380.477642] [<0000000032d71372>] dump_stack_lvl+0x62/0x80 > [ 380.477648] [<000003ff7ff96c00>] xfs_corruption_error+0x70/0xa0 [xfs] > [ 380.477762] [<000003ff7ff551ce>] xfs_dabuf_map.constprop.0+0x2a6/0x368 [xfs] > [ 380.477871] [<000003ff7ff5773e>] xfs_da_read_buf+0x6e/0x128 [xfs] > [ 380.477977] [<000003ff7ff57838>] xfs_da3_node_read+0x40/0x78 [xfs] > [ 380.478085] [<000003ff7ff58c7a>] xfs_da3_node_lookup_int+0x82/0x558 [xfs] > [ 380.478193] [<000003ff7ff68d6e>] xfs_dir2_node_lookup+0x3e/0x140 [xfs] So it's supposed to be in node format, which means enough blocks to have an external free list. I guess a thousand dirents is enough to do that. Yet fsync is run after every file is created and written, so the dirents and directory blocks should all be there.... ..... > _check_xfs_filesystem: filesystem on /dev/loop1 is inconsistent (r) > *** xfs_repair -n output *** > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > - found root inode chunk > Phase 3 - for each AG... > - scan (but don't clear) agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > bad nblocks 9 for inode 128, would reset to 0 > no . entry for directory 128 > no .. entry for root directory 128 > problem with directory contents in inode 128 > would clear root inode 128 > bad nblocks 8 for inode 131, would reset to 0 > bad nblocks 8 for inode 132, would reset to 0 > bad nblocks 8 for inode 133, would reset to 0 > ... > bad nblocks 8 for inode 62438, would reset to 0 > bad nblocks 8 for inode 62439, would reset to 0 > bad nblocks 8 for inode 62440, would reset to 0 > bad nblocks 8 for inode 62441, would reset to 0 Yet all the files - including the data files that were fsync'd - are all bad. Aparently the journal has been recovered, but lots of metadata updates that should have been in the journal are missing after recovery has completed? That doesn't make a whole lot of sense - when did these tests start failing? Can you run a bisect? -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-06 6:13 ` Dave Chinner @ 2023-11-06 19:26 ` Zorro Lang 2023-11-06 20:33 ` Dave Chinner 0 siblings, 1 reply; 19+ messages in thread From: Zorro Lang @ 2023-11-06 19:26 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Mon, Nov 06, 2023 at 05:13:30PM +1100, Dave Chinner wrote: > On Sun, Oct 29, 2023 at 12:11:22PM +0800, Zorro Lang wrote: > > Hi xfs list, > > > > Recently I always hit xfs corruption by running fstests generic/047 [1], and > > it show more failures in dmesg[2], e.g: > > OK, g/047 is an fsync test. > > > > > XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] > > Ok, a directory block index translated to a hole in the file > mapping. That's bad... > > > [2] > > [ 376.468885] run fstests generic/047 at 2023-10-27 09:08:07 > > [ 376.675751] XFS (loop1): Mounting V5 Filesystem 716c9687-ee74-4c12-b6ad-a0b513194f2b > > [ 376.677088] XFS (loop1): Ending clean mount > > [ 376.678189] XFS (loop1): User initiated shutdown received. > > [ 376.678194] XFS (loop1): Metadata I/O Error (0x4) detected at xfs_fs_goingdown+0x5a/0xf8 [xfs] (fs/xfs/xfs_fsops.c:492). Shutting down filesystem. > > [ 376.678409] XFS (loop1): Please unmount the filesystem and rectify the problem(s) > > [ 376.679423] XFS (loop1): Unmounting Filesystem 716c9687-ee74-4c12-b6ad-a0b513194f2b > > [ 376.714910] XFS (loop1): Mounting V5 Filesystem 40196bb2-39f4-4c32-83ef-567f42216699 > > [ 376.716353] XFS (loop1): Ending clean mount > > Files are created and fsync'd here. > > > [ 380.375878] XFS (loop1): User initiated shutdown received. > > [ 380.375888] XFS (loop1): Log I/O Error (0x6) detected at xfs_fs_goingdown+0xb4/0xf8 [xfs] (fs/xfs/xfs_fsops.c:495). Shutting down filesystem. > > Then the fs is shut down. > > > [ 380.376101] XFS (loop1): Please unmount the filesystem and rectify the problem(s) > > [ 380.380373] XFS (loop1): Unmounting Filesystem 40196bb2-39f4-4c32-83ef-567f42216699 > > [ 380.383835] XFS (loop1): Mounting V5 Filesystem 40196bb2-39f4-4c32-83ef-567f42216699 > > [ 380.397086] XFS (loop1): Starting recovery (logdev: internal) > > [ 380.465934] XFS (loop1): Ending recovery (logdev: internal) > > Then it is recovered.... > > [ 380.467409] XFS (loop1): Unmounting Filesystem 40196bb2-39f4-4c32-83ef-567f42216699 > > [ 380.475431] XFS (loop1): Mounting V5 Filesystem 40196bb2-39f4-4c32-83ef-567f42216699 > > [ 380.477235] XFS (loop1): Ending clean mount > > [ 380.477500] XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] > > .... and now the directory is bad. > > > [ 380.477636] CPU: 0 PID: 337362 Comm: 047 Kdump: loaded Tainted: G W 6.6.0-rc7+ #1 > > [ 380.477639] Hardware name: IBM 3931 LA1 400 (KVM/Linux) > > [ 380.477641] Call Trace: > > [ 380.477642] [<0000000032d71372>] dump_stack_lvl+0x62/0x80 > > [ 380.477648] [<000003ff7ff96c00>] xfs_corruption_error+0x70/0xa0 [xfs] > > [ 380.477762] [<000003ff7ff551ce>] xfs_dabuf_map.constprop.0+0x2a6/0x368 [xfs] > > [ 380.477871] [<000003ff7ff5773e>] xfs_da_read_buf+0x6e/0x128 [xfs] > > [ 380.477977] [<000003ff7ff57838>] xfs_da3_node_read+0x40/0x78 [xfs] > > [ 380.478085] [<000003ff7ff58c7a>] xfs_da3_node_lookup_int+0x82/0x558 [xfs] > > [ 380.478193] [<000003ff7ff68d6e>] xfs_dir2_node_lookup+0x3e/0x140 [xfs] > > So it's supposed to be in node format, which means enough blocks to > have an external free list. I guess a thousand dirents is enough to > do that. > > Yet fsync is run after every file is created and written, so the > dirents and directory blocks should all be there.... > > ..... > > > _check_xfs_filesystem: filesystem on /dev/loop1 is inconsistent (r) > > *** xfs_repair -n output *** > > Phase 1 - find and verify superblock... > > Phase 2 - using internal log > > - zero log... > > - scan filesystem freespace and inode maps... > > - found root inode chunk > > Phase 3 - for each AG... > > - scan (but don't clear) agi unlinked lists... > > - process known inodes and perform inode discovery... > > - agno = 0 > > bad nblocks 9 for inode 128, would reset to 0 > > no . entry for directory 128 > > no .. entry for root directory 128 > > problem with directory contents in inode 128 > > would clear root inode 128 > > bad nblocks 8 for inode 131, would reset to 0 > > bad nblocks 8 for inode 132, would reset to 0 > > bad nblocks 8 for inode 133, would reset to 0 > > ... > > bad nblocks 8 for inode 62438, would reset to 0 > > bad nblocks 8 for inode 62439, would reset to 0 > > bad nblocks 8 for inode 62440, would reset to 0 > > bad nblocks 8 for inode 62441, would reset to 0 > > Yet all the files - including the data files that were fsync'd - are > all bad. > > Aparently the journal has been recovered, but lots of metadata > updates that should have been in the journal are missing after > recovery has completed? That doesn't make a whole lot of sense - > when did these tests start failing? Can you run a bisect? Hi Dave, Thanks for your reply :) I tried to do a kernel bisect long time, but find nothing ... Then suddently, I found it's failed from a xfsprogs change [1]. Although that's not the root cause of this bug (on s390x), it just enabled "nrext64" by default, which I never tested on s390x before. For now, we know this's an issue about this feature, and only on s390x for now. Thanks, Zorro [1] commit e5b18d7d1d962e942ce3b0a9ccdb5872074e24df Author: Darrick J. Wong <djwong@kernel.org> Date: Mon Sep 25 14:59:25 2023 -0700 mkfs: enable large extent counts by default > > -Dave. > -- > Dave Chinner > david@fromorbit.com > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-06 19:26 ` Zorro Lang @ 2023-11-06 20:33 ` Dave Chinner 2023-11-06 22:20 ` Darrick J. Wong ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Dave Chinner @ 2023-11-06 20:33 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Tue, Nov 07, 2023 at 03:26:27AM +0800, Zorro Lang wrote: > On Mon, Nov 06, 2023 at 05:13:30PM +1100, Dave Chinner wrote: > > On Sun, Oct 29, 2023 at 12:11:22PM +0800, Zorro Lang wrote: > > > Hi xfs list, > > > > > > Recently I always hit xfs corruption by running fstests generic/047 [1], and > > > it show more failures in dmesg[2], e.g: > > > > OK, g/047 is an fsync test. > > > > > > > > XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] > > > > Ok, a directory block index translated to a hole in the file > > mapping. That's bad... .... > > > _check_xfs_filesystem: filesystem on /dev/loop1 is inconsistent (r) > > > *** xfs_repair -n output *** > > > Phase 1 - find and verify superblock... > > > Phase 2 - using internal log > > > - zero log... > > > - scan filesystem freespace and inode maps... > > > - found root inode chunk > > > Phase 3 - for each AG... > > > - scan (but don't clear) agi unlinked lists... > > > - process known inodes and perform inode discovery... > > > - agno = 0 > > > bad nblocks 9 for inode 128, would reset to 0 > > > no . entry for directory 128 > > > no .. entry for root directory 128 > > > problem with directory contents in inode 128 > > > would clear root inode 128 > > > bad nblocks 8 for inode 131, would reset to 0 > > > bad nblocks 8 for inode 132, would reset to 0 > > > bad nblocks 8 for inode 133, would reset to 0 > > > ... > > > bad nblocks 8 for inode 62438, would reset to 0 > > > bad nblocks 8 for inode 62439, would reset to 0 > > > bad nblocks 8 for inode 62440, would reset to 0 > > > bad nblocks 8 for inode 62441, would reset to 0 > > > > Yet all the files - including the data files that were fsync'd - are > > all bad. > > > > Aparently the journal has been recovered, but lots of metadata > > updates that should have been in the journal are missing after > > recovery has completed? That doesn't make a whole lot of sense - > > when did these tests start failing? Can you run a bisect? > > Hi Dave, > > Thanks for your reply :) I tried to do a kernel bisect long time, but > find nothing ... Then suddently, I found it's failed from a xfsprogs > change [1]. > > Although that's not the root cause of this bug (on s390x), it just > enabled "nrext64" by default, which I never tested on s390x before. > For now, we know this's an issue about this feature, and only on > s390x for now. That's not good. Can you please determine if this is a zero-day bug with the nrext64 feature? I think it was merged in 5.19, so if you could try to reproduce it on a 5.18 and 5.19 kernels first, that would be handy. Also, from your s390 kernel build, can you get the pahole output for the struct xfs_dinode both for a good kernel and a bad kernel? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-06 20:33 ` Dave Chinner @ 2023-11-06 22:20 ` Darrick J. Wong 2023-11-07 8:05 ` Zorro Lang 2023-11-07 8:29 ` Christoph Hellwig 2 siblings, 0 replies; 19+ messages in thread From: Darrick J. Wong @ 2023-11-06 22:20 UTC (permalink / raw) To: Dave Chinner; +Cc: Zorro Lang, linux-xfs, fstests, Carlos Maiolino n Tue, Nov 07, 2023 at 07:33:50AM +1100, Dave Chinner wrote: > On Tue, Nov 07, 2023 at 03:26:27AM +0800, Zorro Lang wrote: > > On Mon, Nov 06, 2023 at 05:13:30PM +1100, Dave Chinner wrote: > > > On Sun, Oct 29, 2023 at 12:11:22PM +0800, Zorro Lang wrote: > > > > Hi xfs list, > > > > > > > > Recently I always hit xfs corruption by running fstests generic/047 [1], and > > > > it show more failures in dmesg[2], e.g: > > > > > > OK, g/047 is an fsync test. > > > > > > > > > > > XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] > > > > > > Ok, a directory block index translated to a hole in the file > > > mapping. That's bad... > .... > > > > _check_xfs_filesystem: filesystem on /dev/loop1 is inconsistent (r) > > > > *** xfs_repair -n output *** > > > > Phase 1 - find and verify superblock... > > > > Phase 2 - using internal log > > > > - zero log... > > > > - scan filesystem freespace and inode maps... > > > > - found root inode chunk > > > > Phase 3 - for each AG... > > > > - scan (but don't clear) agi unlinked lists... > > > > - process known inodes and perform inode discovery... > > > > - agno = 0 > > > > bad nblocks 9 for inode 128, would reset to 0 > > > > no . entry for directory 128 > > > > no .. entry for root directory 128 > > > > problem with directory contents in inode 128 > > > > would clear root inode 128 > > > > bad nblocks 8 for inode 131, would reset to 0 > > > > bad nblocks 8 for inode 132, would reset to 0 > > > > bad nblocks 8 for inode 133, would reset to 0 > > > > ... > > > > bad nblocks 8 for inode 62438, would reset to 0 > > > > bad nblocks 8 for inode 62439, would reset to 0 > > > > bad nblocks 8 for inode 62440, would reset to 0 > > > > bad nblocks 8 for inode 62441, would reset to 0 > > > > > > Yet all the files - including the data files that were fsync'd - are > > > all bad. > > > > > > Aparently the journal has been recovered, but lots of metadata > > > updates that should have been in the journal are missing after > > > recovery has completed? That doesn't make a whole lot of sense - > > > when did these tests start failing? Can you run a bisect? > > > > Hi Dave, > > > > Thanks for your reply :) I tried to do a kernel bisect long time, but > > find nothing ... Then suddently, I found it's failed from a xfsprogs > > change [1]. > > > > Although that's not the root cause of this bug (on s390x), it just > > enabled "nrext64" by default, which I never tested on s390x before. > > For now, we know this's an issue about this feature, and only on > > s390x for now. > > That's not good. Can you please determine if this is a zero-day bug > with the nrext64 feature? I think it was merged in 5.19, so if you > could try to reproduce it on a 5.18 and 5.19 kernels first, that > would be handy. > > Also, from your s390 kernel build, can you get the pahole output > for the struct xfs_dinode both for a good kernel and a bad kernel? And could you get the pahole output of xfs_log_dinode as well? I'm assuming that these problems only come up after *log recovery* and that generic/476 is fine. --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-06 20:33 ` Dave Chinner 2023-11-06 22:20 ` Darrick J. Wong @ 2023-11-07 8:05 ` Zorro Lang 2023-11-07 8:13 ` Dave Chinner 2023-11-07 8:29 ` Christoph Hellwig 2 siblings, 1 reply; 19+ messages in thread From: Zorro Lang @ 2023-11-07 8:05 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Tue, Nov 07, 2023 at 07:33:50AM +1100, Dave Chinner wrote: > On Tue, Nov 07, 2023 at 03:26:27AM +0800, Zorro Lang wrote: > > On Mon, Nov 06, 2023 at 05:13:30PM +1100, Dave Chinner wrote: > > > On Sun, Oct 29, 2023 at 12:11:22PM +0800, Zorro Lang wrote: > > > > Hi xfs list, > > > > > > > > Recently I always hit xfs corruption by running fstests generic/047 [1], and > > > > it show more failures in dmesg[2], e.g: > > > > > > OK, g/047 is an fsync test. > > > > > > > > > > > XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] > > > > > > Ok, a directory block index translated to a hole in the file > > > mapping. That's bad... > .... > > > > _check_xfs_filesystem: filesystem on /dev/loop1 is inconsistent (r) > > > > *** xfs_repair -n output *** > > > > Phase 1 - find and verify superblock... > > > > Phase 2 - using internal log > > > > - zero log... > > > > - scan filesystem freespace and inode maps... > > > > - found root inode chunk > > > > Phase 3 - for each AG... > > > > - scan (but don't clear) agi unlinked lists... > > > > - process known inodes and perform inode discovery... > > > > - agno = 0 > > > > bad nblocks 9 for inode 128, would reset to 0 > > > > no . entry for directory 128 > > > > no .. entry for root directory 128 > > > > problem with directory contents in inode 128 > > > > would clear root inode 128 > > > > bad nblocks 8 for inode 131, would reset to 0 > > > > bad nblocks 8 for inode 132, would reset to 0 > > > > bad nblocks 8 for inode 133, would reset to 0 > > > > ... > > > > bad nblocks 8 for inode 62438, would reset to 0 > > > > bad nblocks 8 for inode 62439, would reset to 0 > > > > bad nblocks 8 for inode 62440, would reset to 0 > > > > bad nblocks 8 for inode 62441, would reset to 0 > > > > > > Yet all the files - including the data files that were fsync'd - are > > > all bad. > > > > > > Aparently the journal has been recovered, but lots of metadata > > > updates that should have been in the journal are missing after > > > recovery has completed? That doesn't make a whole lot of sense - > > > when did these tests start failing? Can you run a bisect? > > > > Hi Dave, > > > > Thanks for your reply :) I tried to do a kernel bisect long time, but > > find nothing ... Then suddently, I found it's failed from a xfsprogs > > change [1]. > > > > Although that's not the root cause of this bug (on s390x), it just > > enabled "nrext64" by default, which I never tested on s390x before. > > For now, we know this's an issue about this feature, and only on > > s390x for now. > > That's not good. Can you please determine if this is a zero-day bug > with the nrext64 feature? I think it was merged in 5.19, so if you > could try to reproduce it on a 5.18 and 5.19 kernels first, that > would be handy. Unfortunately, it's a bug be there nearly from beginning. The linux v5.19 can trigger this bug (with latest xfsprogs for-next branch): FSTYP -- xfs (debug) PLATFORM -- Linux/s390x s390x-kvm-026 5.19.0 #1 SMP Tue Nov 7 00:44:49 EST 2023 MKFS_OPTIONS -- -f -m crc=1,finobt=1,rmapbt=0,reflink=1,bigtime=1,inobtcount=1 /dev/loop1 MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/fstests/SCRATCH_DIR generic/047 _check_xfs_filesystem: filesystem on /dev/loop1 failed scrub (see /var/lib/xfstests/results//generic/047.full for details) _check_xfs_filesystem: filesystem on /dev/loop1 is inconsistent (r) (see /var/lib/xfstests/results//generic/047.full for details) _check_dmesg: something found in dmesg (see /var/lib/xfstests/results//generic/047.dmesg) - output mismatch (see /var/lib/xfstests/results//generic/047.out.bad) --- tests/generic/047.out 2023-11-07 01:18:41.687291490 -0500 +++ /var/lib/xfstests/results//generic/047.out.bad 2023-11-07 01:19:32.237271471 -0500 @@ -1 +1,1000 @@ QA output created by 047 +file /mnt/fstests/SCRATCH_DIR/1 missing - fsync failed +file /mnt/fstests/SCRATCH_DIR/2 missing - fsync failed +file /mnt/fstests/SCRATCH_DIR/3 missing - fsync failed +file /mnt/fstests/SCRATCH_DIR/4 missing - fsync failed +file /mnt/fstests/SCRATCH_DIR/5 missing - fsync failed +file /mnt/fstests/SCRATCH_DIR/6 missing - fsync failed ... (Run 'diff -u /var/lib/xfstests/tests/generic/047.out /var/lib/xfstests/results//generic/047.out.bad' to see the entire diff) Ran: generic/047 Failures: generic/047 Failed 1 of 1 tests [ 252.399348] XFS (loop1): Corruption detected. Unmount and run xfs_repair [ 252.399349] XFS (loop1): xfs_dabuf_map: bno 8388608 inode 128 [ 252.399351] XFS (loop1): [00] br_startoff 8388608 br_startblock -2 br_blockcount 1 br_state 0 [ 252.399367] XFS (loop1): Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x370 [xfs] [ 252.399452] CPU: 1 PID: 129486 Comm: 047 Kdump: loaded Not tainted 5.19.0 #1 [ 252.399454] Hardware name: IBM 8561 LT1 400 (KVM/Linux) [ 252.399455] Call Trace: [ 252.399469] [<000000010fe8b17a>] dump_stack_lvl+0x62/0x80 [ 252.399471] [<000003ff8021e3c0>] xfs_corruption_error+0x70/0xa0 [xfs] [ 252.399561] [<000003ff801dcbd6>] xfs_dabuf_map.constprop.0+0x2a6/0x370 [xfs] [ 252.399646] [<000003ff801df1ce>] xfs_da_read_buf+0x6e/0x130 [xfs] [ 252.399731] [<000003ff801df2d0>] xfs_da3_node_read+0x40/0x80 [xfs] [ 252.399816] [<000003ff801e0732>] xfs_da3_node_lookup_int+0x82/0x560 [xfs] [ 252.399900] [<000003ff801f08de>] xfs_dir2_node_lookup+0x3e/0x150 [xfs] [ 252.399986] [<000003ff801e49de>] xfs_dir_lookup+0x1ee/0x220 [xfs] [ 252.400071] [<000003ff80235da0>] xfs_lookup+0x60/0x110 [xfs] [ 252.400161] [<000003ff802310b2>] xfs_vn_lookup+0x62/0xa0 [xfs] [ 252.400251] [<000000010f86eb7a>] __lookup_slow+0x9a/0x170 [ 252.400253] [<000000010f87381a>] walk_component+0x14a/0x1e0 [ 252.400255] [<000000010f874098>] path_lookupat+0x88/0x1f0 [ 252.400257] [<000000010f87569a>] filename_lookup+0xaa/0x1a0 [ 252.400259] [<000000010f866a66>] vfs_statx+0x86/0x120 [ 252.400261] [<000000010f866da4>] vfs_fstatat+0x74/0xa0 [ 252.400263] [<000000010f867038>] __do_sys_newfstatat+0x28/0x50 [ 252.400264] [<000000010fe8eeb0>] __do_syscall+0x1d0/0x200 [ 252.400266] [<000000010fe9d6c2>] system_call+0x82/0xb0 And the v5.18 doesn't support this feature, so ... [ 95.265522] XFS (loop0): Superblock has unknown incompatible features (0x20) enabled. [ 95.265533] XFS (loop0): Filesystem cannot be safely mounted by this kernel. [ 95.265535] XFS (loop0): SB validate failed with error -22. If you need, tell me which commit (list) is the beginning we support this feature. Or which commit you feel it's dangerous, I can give it a specific testing. Thanks, Zorro > > Also, from your s390 kernel build, can you get the pahole output > for the struct xfs_dinode both for a good kernel and a bad kernel? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-07 8:05 ` Zorro Lang @ 2023-11-07 8:13 ` Dave Chinner 2023-11-07 15:13 ` Zorro Lang 0 siblings, 1 reply; 19+ messages in thread From: Dave Chinner @ 2023-11-07 8:13 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Tue, Nov 07, 2023 at 04:05:22PM +0800, Zorro Lang wrote: > On Tue, Nov 07, 2023 at 07:33:50AM +1100, Dave Chinner wrote: > > On Tue, Nov 07, 2023 at 03:26:27AM +0800, Zorro Lang wrote: > > > Thanks for your reply :) I tried to do a kernel bisect long time, but > > > find nothing ... Then suddently, I found it's failed from a xfsprogs > > > change [1]. > > > > > > Although that's not the root cause of this bug (on s390x), it just > > > enabled "nrext64" by default, which I never tested on s390x before. > > > For now, we know this's an issue about this feature, and only on > > > s390x for now. > > > > That's not good. Can you please determine if this is a zero-day bug > > with the nrext64 feature? I think it was merged in 5.19, so if you > > could try to reproduce it on a 5.18 and 5.19 kernels first, that > > would be handy. > > Unfortunately, it's a bug be there nearly from beginning. The linux v5.19 > can trigger this bug (with latest xfsprogs for-next branch): Ok. Can you grab the pahole output for the xfs_dinode and xfs_log_dinode for s390 from both 5.18 and 5.19 kernel builds? (i.e. 'pahole fs/xfs/xfs_inode.o |less' and search for the two structures). Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-07 8:13 ` Dave Chinner @ 2023-11-07 15:13 ` Zorro Lang 2023-11-08 6:38 ` Dave Chinner 0 siblings, 1 reply; 19+ messages in thread From: Zorro Lang @ 2023-11-07 15:13 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Tue, Nov 07, 2023 at 07:13:39PM +1100, Dave Chinner wrote: > On Tue, Nov 07, 2023 at 04:05:22PM +0800, Zorro Lang wrote: > > On Tue, Nov 07, 2023 at 07:33:50AM +1100, Dave Chinner wrote: > > > On Tue, Nov 07, 2023 at 03:26:27AM +0800, Zorro Lang wrote: > > > > Thanks for your reply :) I tried to do a kernel bisect long time, but > > > > find nothing ... Then suddently, I found it's failed from a xfsprogs > > > > change [1]. > > > > > > > > Although that's not the root cause of this bug (on s390x), it just > > > > enabled "nrext64" by default, which I never tested on s390x before. > > > > For now, we know this's an issue about this feature, and only on > > > > s390x for now. > > > > > > That's not good. Can you please determine if this is a zero-day bug > > > with the nrext64 feature? I think it was merged in 5.19, so if you > > > could try to reproduce it on a 5.18 and 5.19 kernels first, that > > > would be handy. > > > > Unfortunately, it's a bug be there nearly from beginning. The linux v5.19 > > can trigger this bug (with latest xfsprogs for-next branch): > > Ok. Can you grab the pahole output for the xfs_dinode and > xfs_log_dinode for s390 from both 5.18 and 5.19 kernel builds? > (i.e. 'pahole fs/xfs/xfs_inode.o |less' and search for the two > structures). I can't find xfs_log_dinode in fs/xfs/xfs_inode.o, but I can find both structures in fs/xfs/xfs_inode_item.o, so below output base on: # pahole fs/xfs/xfs_inode_item.o The output on v5.19 is [1], v5.18 output is [2], the diff of 5.18 and 5.19 is [3]. Thanks, Zorro [1] v5.19: struct xfs_dinode { __be16 di_magic; /* 0 2 */ __be16 di_mode; /* 2 2 */ __u8 di_version; /* 4 1 */ __u8 di_format; /* 5 1 */ __be16 di_onlink; /* 6 2 */ __be32 di_uid; /* 8 4 */ __be32 di_gid; /* 12 4 */ __be32 di_nlink; /* 16 4 */ __be16 di_projid_lo; /* 20 2 */ __be16 di_projid_hi; /* 22 2 */ union { __be64 di_big_nextents; /* 24 8 */ __be64 di_v3_pad; /* 24 8 */ struct { __u8 di_v2_pad[6]; /* 24 6 */ __be16 di_flushiter; /* 30 2 */ }; /* 24 8 */ }; /* 24 8 */ xfs_timestamp_t di_atime; /* 32 8 */ xfs_timestamp_t di_mtime; /* 40 8 */ xfs_timestamp_t di_ctime; /* 48 8 */ __be64 di_size; /* 56 8 */ __be64 di_nblocks; /* 64 8 */ __be32 di_extsize; /* 72 4 */ union { struct { __be32 di_nextents; /* 76 4 */ __be16 di_anextents; /* 80 2 */ } __attribute__((__packed__)); /* 76 6 */ struct { __be32 di_big_anextents; /* 76 4 */ __be16 di_nrext64_pad; /* 80 2 */ } __attribute__((__packed__)); /* 76 6 */ }; /* 76 6 */ __u8 di_forkoff; /* 82 1 */ __s8 di_aformat; /* 83 1 */ __be32 di_dmevmask; /* 84 4 */ __be16 di_dmstate; /* 88 2 */ __be16 di_flags; /* 90 2 */ __be32 di_gen; /* 92 4 */ __be32 di_next_unlinked; /* 96 4 */ __le32 di_crc; /* 100 4 */ __be64 di_changecount; /* 104 8 */ __be64 di_lsn; /* 112 8 */ __be64 di_flags2; /* 120 8 */ __be32 di_cowextsize; /* 128 4 */ __u8 di_pad2[12]; /* 132 12 */ xfs_timestamp_t di_crtime; /* 144 8 */ __be64 di_ino; /* 152 8 */ uuid_t di_uuid; /* 160 16 */ /* size: 176, cachelines: 1, members: 34 */ /* last cacheline: 176 bytes */ }; struct xfs_log_dinode { uint16_t di_magic; /* 0 2 */ uint16_t di_mode; /* 2 2 */ int8_t di_version; /* 4 1 */ int8_t di_format; /* 5 1 */ uint8_t di_pad3[2]; /* 6 2 */ uint32_t di_uid; /* 8 4 */ uint32_t di_gid; /* 12 4 */ uint32_t di_nlink; /* 16 4 */ uint16_t di_projid_lo; /* 20 2 */ uint16_t di_projid_hi; /* 22 2 */ union { uint64_t di_big_nextents; /* 24 8 */ uint64_t di_v3_pad; /* 24 8 */ struct { uint8_t di_v2_pad[6]; /* 24 6 */ uint16_t di_flushiter; /* 30 2 */ }; /* 24 8 */ }; /* 24 8 */ xfs_log_timestamp_t di_atime; /* 32 8 */ xfs_log_timestamp_t di_mtime; /* 40 8 */ xfs_log_timestamp_t di_ctime; /* 48 8 */ xfs_fsize_t di_size; /* 56 8 */ xfs_rfsblock_t di_nblocks; /* 64 8 */ xfs_extlen_t di_extsize; /* 72 4 */ union { struct { uint32_t di_nextents; /* 76 4 */ uint16_t di_anextents; /* 80 2 */ } __attribute__((__packed__)); /* 76 6 */ struct { uint32_t di_big_anextents; /* 76 4 */ uint16_t di_nrext64_pad; /* 80 2 */ } __attribute__((__packed__)); /* 76 6 */ }; /* 76 6 */ uint8_t di_forkoff; /* 82 1 */ int8_t di_aformat; /* 83 1 */ uint32_t di_dmevmask; /* 84 4 */ uint16_t di_dmstate; /* 88 2 */ uint16_t di_flags; /* 90 2 */ uint32_t di_gen; /* 92 4 */ xfs_agino_t di_next_unlinked; /* 96 4 */ uint32_t di_crc; /* 100 4 */ uint64_t di_changecount; /* 104 8 */ xfs_lsn_t di_lsn; /* 112 8 */ uint64_t di_flags2; /* 120 8 */ uint32_t di_cowextsize; /* 128 4 */ uint8_t di_pad2[12]; /* 132 12 */ xfs_log_timestamp_t di_crtime; /* 144 8 */ xfs_ino_t di_ino; /* 152 8 */ uuid_t di_uuid; /* 160 16 */ /* size: 176, cachelines: 1, members: 34 */ /* last cacheline: 176 bytes */ }; [2] v5.18: struct xfs_dinode { __be16 di_magic; /* 0 2 */ __be16 di_mode; /* 2 2 */ __u8 di_version; /* 4 1 */ __u8 di_format; /* 5 1 */ __be16 di_onlink; /* 6 2 */ __be32 di_uid; /* 8 4 */ __be32 di_gid; /* 12 4 */ __be32 di_nlink; /* 16 4 */ __be16 di_projid_lo; /* 20 2 */ __be16 di_projid_hi; /* 22 2 */ __u8 di_pad[6]; /* 24 6 */ __be16 di_flushiter; /* 30 2 */ xfs_timestamp_t di_atime; /* 32 8 */ xfs_timestamp_t di_mtime; /* 40 8 */ xfs_timestamp_t di_ctime; /* 48 8 */ __be64 di_size; /* 56 8 */ __be64 di_nblocks; /* 64 8 */ __be32 di_extsize; /* 72 4 */ __be32 di_nextents; /* 76 4 */ __be16 di_anextents; /* 80 2 */ __u8 di_forkoff; /* 82 1 */ __s8 di_aformat; /* 83 1 */ __be32 di_dmevmask; /* 84 4 */ __be16 di_dmstate; /* 88 2 */ __be16 di_flags; /* 90 2 */ __be32 di_gen; /* 92 4 */ __be32 di_next_unlinked; /* 96 4 */ __le32 di_crc; /* 100 4 */ __be64 di_changecount; /* 104 8 */ __be64 di_lsn; /* 112 8 */ __be64 di_flags2; /* 120 8 */ __be32 di_cowextsize; /* 128 4 */ __u8 di_pad2[12]; /* 132 12 */ xfs_timestamp_t di_crtime; /* 144 8 */ __be64 di_ino; /* 152 8 */ uuid_t di_uuid; /* 160 16 */ /* size: 176, cachelines: 1, members: 36 */ /* last cacheline: 176 bytes */ }; struct xfs_log_dinode { uint16_t di_magic; /* 0 2 */ uint16_t di_mode; /* 2 2 */ int8_t di_version; /* 4 1 */ int8_t di_format; /* 5 1 */ uint8_t di_pad3[2]; /* 6 2 */ uint32_t di_uid; /* 8 4 */ uint32_t di_gid; /* 12 4 */ uint32_t di_nlink; /* 16 4 */ uint16_t di_projid_lo; /* 20 2 */ uint16_t di_projid_hi; /* 22 2 */ uint8_t di_pad[6]; /* 24 6 */ uint16_t di_flushiter; /* 30 2 */ xfs_log_timestamp_t di_atime; /* 32 8 */ xfs_log_timestamp_t di_mtime; /* 40 8 */ xfs_log_timestamp_t di_ctime; /* 48 8 */ xfs_fsize_t di_size; /* 56 8 */ xfs_rfsblock_t di_nblocks; /* 64 8 */ xfs_extlen_t di_extsize; /* 72 4 */ xfs_extnum_t di_nextents; /* 76 4 */ xfs_aextnum_t di_anextents; /* 80 2 */ uint8_t di_forkoff; /* 82 1 */ int8_t di_aformat; /* 83 1 */ uint32_t di_dmevmask; /* 84 4 */ uint16_t di_dmstate; /* 88 2 */ uint16_t di_flags; /* 90 2 */ uint32_t di_gen; /* 92 4 */ xfs_agino_t di_next_unlinked; /* 96 4 */ uint32_t di_crc; /* 100 4 */ uint64_t di_changecount; /* 104 8 */ xfs_lsn_t di_lsn; /* 112 8 */ uint64_t di_flags2; /* 120 8 */ uint32_t di_cowextsize; /* 128 4 */ uint8_t di_pad2[12]; /* 132 12 */ xfs_log_timestamp_t di_crtime; /* 144 8 */ xfs_ino_t di_ino; /* 152 8 */ uuid_t di_uuid; /* 160 16 */ /* size: 176, cachelines: 1, members: 36 */ /* last cacheline: 176 bytes */ }; [3] # diff -Nup xfs_inode_item.pahole.518 xfs_inode_item.pahole.519 @@ -6763,16 +6799,30 @@ struct xfs_dinode { __be32 di_nlink; /* 16 4 */ __be16 di_projid_lo; /* 20 2 */ __be16 di_projid_hi; /* 22 2 */ - __u8 di_pad[6]; /* 24 6 */ - __be16 di_flushiter; /* 30 2 */ + union { + __be64 di_big_nextents; /* 24 8 */ + __be64 di_v3_pad; /* 24 8 */ + struct { + __u8 di_v2_pad[6]; /* 24 6 */ + __be16 di_flushiter; /* 30 2 */ + }; /* 24 8 */ + }; /* 24 8 */ xfs_timestamp_t di_atime; /* 32 8 */ xfs_timestamp_t di_mtime; /* 40 8 */ xfs_timestamp_t di_ctime; /* 48 8 */ __be64 di_size; /* 56 8 */ __be64 di_nblocks; /* 64 8 */ __be32 di_extsize; /* 72 4 */ - __be32 di_nextents; /* 76 4 */ - __be16 di_anextents; /* 80 2 */ + union { + struct { + __be32 di_nextents; /* 76 4 */ + __be16 di_anextents; /* 80 2 */ + } __attribute__((__packed__)); /* 76 6 */ + struct { + __be32 di_big_anextents; /* 76 4 */ + __be16 di_nrext64_pad; /* 80 2 */ + } __attribute__((__packed__)); /* 76 6 */ + }; /* 76 6 */ __u8 di_forkoff; /* 82 1 */ __s8 di_aformat; /* 83 1 */ __be32 di_dmevmask; /* 84 4 */ @@ -6790,7 +6840,7 @@ struct xfs_dinode { __be64 di_ino; /* 152 8 */ uuid_t di_uuid; /* 160 16 */ - /* size: 176, cachelines: 1, members: 36 */ + /* size: 176, cachelines: 1, members: 34 */ /* last cacheline: 176 bytes */ }; ... ... @@ -6932,16 +6992,30 @@ struct xfs_log_dinode { uint32_t di_nlink; /* 16 4 */ uint16_t di_projid_lo; /* 20 2 */ uint16_t di_projid_hi; /* 22 2 */ - uint8_t di_pad[6]; /* 24 6 */ - uint16_t di_flushiter; /* 30 2 */ + union { + uint64_t di_big_nextents; /* 24 8 */ + uint64_t di_v3_pad; /* 24 8 */ + struct { + uint8_t di_v2_pad[6]; /* 24 6 */ + uint16_t di_flushiter; /* 30 2 */ + }; /* 24 8 */ + }; /* 24 8 */ xfs_log_timestamp_t di_atime; /* 32 8 */ xfs_log_timestamp_t di_mtime; /* 40 8 */ xfs_log_timestamp_t di_ctime; /* 48 8 */ xfs_fsize_t di_size; /* 56 8 */ xfs_rfsblock_t di_nblocks; /* 64 8 */ xfs_extlen_t di_extsize; /* 72 4 */ - xfs_extnum_t di_nextents; /* 76 4 */ - xfs_aextnum_t di_anextents; /* 80 2 */ + union { + struct { + uint32_t di_nextents; /* 76 4 */ + uint16_t di_anextents; /* 80 2 */ + } __attribute__((__packed__)); /* 76 6 */ + struct { + uint32_t di_big_anextents; /* 76 4 */ + uint16_t di_nrext64_pad; /* 80 2 */ + } __attribute__((__packed__)); /* 76 6 */ + }; /* 76 6 */ uint8_t di_forkoff; /* 82 1 */ int8_t di_aformat; /* 83 1 */ uint32_t di_dmevmask; /* 84 4 */ @@ -6959,7 +7033,7 @@ struct xfs_log_dinode { xfs_ino_t di_ino; /* 152 8 */ uuid_t di_uuid; /* 160 16 */ - /* size: 176, cachelines: 1, members: 36 */ + /* size: 176, cachelines: 1, members: 34 */ /* last cacheline: 176 bytes */ }; > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-07 15:13 ` Zorro Lang @ 2023-11-08 6:38 ` Dave Chinner [not found] ` <CAN=2_H+CdEK_rEUmYbmkCjSRqhX2cwi5yRHQcKAmKDPF16vqOw@mail.gmail.com> 0 siblings, 1 reply; 19+ messages in thread From: Dave Chinner @ 2023-11-08 6:38 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Tue, Nov 07, 2023 at 11:13:14PM +0800, Zorro Lang wrote: > On Tue, Nov 07, 2023 at 07:13:39PM +1100, Dave Chinner wrote: > > On Tue, Nov 07, 2023 at 04:05:22PM +0800, Zorro Lang wrote: > > > On Tue, Nov 07, 2023 at 07:33:50AM +1100, Dave Chinner wrote: > > > > On Tue, Nov 07, 2023 at 03:26:27AM +0800, Zorro Lang wrote: > > > > > Thanks for your reply :) I tried to do a kernel bisect long time, but > > > > > find nothing ... Then suddently, I found it's failed from a xfsprogs > > > > > change [1]. > > > > > > > > > > Although that's not the root cause of this bug (on s390x), it just > > > > > enabled "nrext64" by default, which I never tested on s390x before. > > > > > For now, we know this's an issue about this feature, and only on > > > > > s390x for now. > > > > > > > > That's not good. Can you please determine if this is a zero-day bug > > > > with the nrext64 feature? I think it was merged in 5.19, so if you > > > > could try to reproduce it on a 5.18 and 5.19 kernels first, that > > > > would be handy. > > > > > > Unfortunately, it's a bug be there nearly from beginning. The linux v5.19 > > > can trigger this bug (with latest xfsprogs for-next branch): > > > > Ok. Can you grab the pahole output for the xfs_dinode and > > xfs_log_dinode for s390 from both 5.18 and 5.19 kernel builds? > > (i.e. 'pahole fs/xfs/xfs_inode.o |less' and search for the two > > structures). > > I can't find xfs_log_dinode in fs/xfs/xfs_inode.o, but I can find both structures > in fs/xfs/xfs_inode_item.o, so below output base on: > > # pahole fs/xfs/xfs_inode_item.o > > The output on v5.19 is [1], v5.18 output is [2], the diff of 5.18 and 5.19 is [3]. Ok, so there's nothing wrong with the on-disk format definition or the journal format - they both lay out in exactly the right shape so I think at this point we need metadumps from the broken filesystems. Can you pick one of the failing tests and grab metadumps from the shutdown filesystem (i.e. before it is recovered) and then another from after it is recovered and the problem tripped over? I know I won't be able to replay the log on x86-64, but knowing what is in the journal vs what ends up being recovered will tell us where to look next. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CAN=2_H+CdEK_rEUmYbmkCjSRqhX2cwi5yRHQcKAmKDPF16vqOw@mail.gmail.com>]
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] [not found] ` <CAN=2_H+CdEK_rEUmYbmkCjSRqhX2cwi5yRHQcKAmKDPF16vqOw@mail.gmail.com> @ 2023-11-09 6:14 ` Dave Chinner 2023-11-09 14:09 ` Zorro Lang 0 siblings, 1 reply; 19+ messages in thread From: Dave Chinner @ 2023-11-09 6:14 UTC (permalink / raw) To: Zirong Lang; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Thu, Nov 09, 2023 at 10:43:58AM +0800, Zirong Lang wrote: > By changing the generic/047 as below, I got 2 dump files and 2 log files. > Please check the attachment, > and feel free to tell me if you need more. Well, we've narrowed down to some weird recovery issue - the journal is intact, recovery recovers the inode from the correct log item in the journal, but the inode written to disk by recovery is corrupt. What this points out is that we don't actually verify the inode we recover is valid before writeback is queued, nor do we detect the specific corruption being encountered in the verifier (i.e. non-zero nblocks count when extent count is zero). Can you add the patch below and see if/when the verifier fires on the reproducer? I'm particularly interested to know where it fires - in recovery before writeback, or when the root inode is re-read from disk. It doesn't fire on x64-64.... -Dave. -- Dave Chinner david@fromorbit.com xfs: inode recovery does not validate the recovered inode From: Dave Chinner <dchinner@redhat.com> Discovered when trying to track down a weird recovery corruption issue that wasn't detected at recovery time. The specific corruption was a zero extent count field when big extent counts are in use, and it turns out the dinode verifier doesn't detect that specific corruption case, either. So fix it too. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_inode_buf.c | 3 +++ fs/xfs/xfs_inode_item_recover.c | 14 +++++++++++++- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index a35781577cad..0f970a0b3382 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -508,6 +508,9 @@ xfs_dinode_verify( if (mode && nextents + naextents > nblocks) return __this_address; + if (nextents + naextents == 0 && nblocks != 0) + return __this_address; + if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents) return __this_address; diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 6b09e2bf2d74..f4c31c2b60d5 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -286,6 +286,7 @@ xlog_recover_inode_commit_pass2( struct xfs_log_dinode *ldip; uint isize; int need_free = 0; + xfs_failaddr_t fa; if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) { in_f = item->ri_buf[0].i_addr; @@ -529,8 +530,19 @@ xlog_recover_inode_commit_pass2( (dip->di_mode != 0)) error = xfs_recover_inode_owner_change(mp, dip, in_f, buffer_list); - /* re-generate the checksum. */ + /* re-generate the checksum and validate the recovered inode. */ xfs_dinode_calc_crc(log->l_mp, dip); + fa = xfs_dinode_verify(log->l_mp, in_f->ilf_ino, dip); + if (fa) { + XFS_CORRUPTION_ERROR( + "Bad dinode after recovery", + XFS_ERRLEVEL_LOW, mp, dip, sizeof(*dip)); + xfs_alert(mp, + "Metadata corruption detected at %pS, inode 0x%llx", + fa, in_f->ilf_ino); + error = -EFSCORRUPTED; + goto out_release; + } ASSERT(bp->b_mount == mp); bp->b_flags |= _XBF_LOGRECOVERY; ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-09 6:14 ` Dave Chinner @ 2023-11-09 14:09 ` Zorro Lang 2023-11-09 23:13 ` Dave Chinner 0 siblings, 1 reply; 19+ messages in thread From: Zorro Lang @ 2023-11-09 14:09 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Thu, Nov 09, 2023 at 05:14:51PM +1100, Dave Chinner wrote: > On Thu, Nov 09, 2023 at 10:43:58AM +0800, Zirong Lang wrote: > > By changing the generic/047 as below, I got 2 dump files and 2 log files. > > Please check the attachment, > > and feel free to tell me if you need more. > > Well, we've narrowed down to some weird recovery issue - the journal > is intact, recovery recovers the inode from the correct log item in > the journal, but the inode written to disk by recovery is corrupt. > > What this points out is that we don't actually verify the inode we > recover is valid before writeback is queued, nor do we detect the > specific corruption being encountered in the verifier (i.e. non-zero > nblocks count when extent count is zero). > > Can you add the patch below and see if/when the verifier fires on > the reproducer? I'm particularly interested to know where it fires - > in recovery before writeback, or when the root inode is re-read from > disk. It doesn't fire on x64-64.... Hi Dave, I just re-tested with your patch, the steps as [1]. The trace-cmd output can be downloaded from [2]. Please tell me if I misunderstood anything. Thanks, Zorro [1] # ./check generic/047 FSTYP -- xfs (debug) PLATFORM -- Linux/s390x ibm-z-507 6.6.0+ #1 SMP Wed Nov 8 12:51:20 EST 2023 MKFS_OPTIONS -- -f -m crc=1,finobt=1,rmapbt=0,reflink=1,bigtime=1,inobtcount=1 /dev/loop1 MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/fstests/SCRATCH_DIR generic/047 [failed, exit status 1]_check_dmesg: something found in dmesg (see /var/lib/xfstests/results//generic/047.dmesg) - output mismatch (see /var/lib/xfstests/results//generic/047.out.bad) --- tests/generic/047.out 2023-11-08 13:02:38.424036829 -0500 +++ /var/lib/xfstests/results//generic/047.out.bad 2023-11-09 08:41:50.026163756 -0500 @@ -1 +1,4 @@ QA output created by 047 +mount: /mnt/fstests/SCRATCH_DIR: mount(2) system call failed: Structure needs cleaning. +mount -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/fstests/SCRATCH_DIR failed +(see /var/lib/xfstests/results//generic/047.full for details) ... (Run 'diff -u /var/lib/xfstests/tests/generic/047.out /var/lib/xfstests/results//generic/047.out.bad' to see the entire diff) Ran: generic/047 Failures: generic/047 Failed 1 of 1 tests # mount /dev/loop1 /mnt/fstests/SCRATCH_DIR mount: /mnt/fstests/SCRATCH_DIR: mount(2) system call failed: Structure needs cleaning. # dmesg [ 8272.820617] run fstests generic/047 at 2023-11-09 08:36:39 [ 8273.231657] XFS (loop1): Mounting V5 Filesystem 9faf1858-4d74-4b58-b0e7-68fd1d90762a [ 8273.233769] XFS (loop1): Ending clean mount [ 8273.235510] XFS (loop1): User initiated shutdown received. [ 8273.235520] XFS (loop1): Metadata I/O Error (0x4) detected at xfs_fs_goingdown+0x5a/0xf8 [xfs] (fs/xfs/xfs_fsops.c:492). Shutting down filesystem. [ 8273.235772] XFS (loop1): Please unmount the filesystem and rectify the problem(s) [ 8273.236810] XFS (loop1): Unmounting Filesystem 9faf1858-4d74-4b58-b0e7-68fd1d90762a [ 8273.284005] XFS (loop1): Mounting V5 Filesystem 890f6068-8eb1-4afa-b1fb-5deb3d937fce [ 8273.285668] XFS (loop1): Ending clean mount [ 8275.789831] XFS (loop1): User initiated shutdown received. [ 8275.789846] XFS (loop1): Log I/O Error (0x6) detected at xfs_fs_goingdown+0xb4/0xf8 [xfs] (fs/xfs/xfs_fsops.c:495). Shutting down filesystem. [ 8275.790170] XFS (loop1): Please unmount the filesystem and rectify the problem(s) [ 8275.795171] XFS (loop1): Unmounting Filesystem 890f6068-8eb1-4afa-b1fb-5deb3d937fce [ 8282.714748] XFS (loop1): Mounting V5 Filesystem 890f6068-8eb1-4afa-b1fb-5deb3d937fce [ 8282.728494] XFS (loop1): Starting recovery (logdev: internal) [ 8282.741350] 00000000: 49 4e 81 80 03 02 00 00 00 00 00 00 00 00 00 00 IN.............. [ 8282.741357] 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 8282.741360] 00000020: 35 63 5c e5 7f d2 65 f2 35 63 5c e5 7f d2 65 f2 5c\...e.5c\...e. [ 8282.741363] 00000030: 35 63 5c e5 7f d2 65 f2 00 00 00 00 00 00 80 00 5c\...e......... [ 8282.741366] 00000040: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 ................ [ 8282.741368] 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5d b8 45 2b ............].E+ [ 8282.741371] 00000060: ff ff ff ff 77 bb a7 2f 00 00 00 00 00 00 00 04 ....w../........ [ 8282.741375] 00000070: 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 18 ................ [ 8282.741378] 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 8282.741381] 00000090: 35 63 5c e5 7f d2 65 f2 00 00 00 00 00 00 00 83 5c\...e......... [ 8282.741397] 000000a0: 89 0f 60 68 8e b1 4a fa b1 fb 5d eb 3d 93 7f ce ..`h..J...].=... [ 8282.741399] XFS (loop1): Internal error Bad dinode after recovery at line 536 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x72/0xf0 [xfs] [ 8282.741739] CPU: 1 PID: 7645 Comm: mount Kdump: loaded Not tainted 6.6.0+ #1 [ 8282.741743] Hardware name: IBM 8561 LT1 400 (z/VM 7.2.0) [ 8282.741746] Call Trace: [ 8282.741748] [<000000004e7af67a>] dump_stack_lvl+0x62/0x80 [ 8282.741756] [<000003ff800b1cb0>] xfs_corruption_error+0x70/0xa0 [xfs] [ 8282.741863] [<000003ff800f0032>] xlog_recover_inode_commit_pass2+0x63a/0xb10 [xfs] [ 8282.741973] [<000003ff800f400a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] [ 8282.742082] [<000003ff800f4cc6>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] [ 8282.742192] [<000003ff800f4e28>] xlog_recovery_process_trans+0x140/0x148 [xfs] [ 8282.742302] [<000003ff800f4ef0>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] [ 8282.742413] [<000003ff800f59b6>] xlog_recover_process_data+0xb6/0x168 [xfs] [ 8282.742528] [<000003ff800f5b6c>] xlog_recover_process+0x104/0x150 [xfs] [ 8282.742638] [<000003ff800f5f6a>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] [ 8282.742747] [<000003ff800f6758>] xlog_do_log_recovery+0x88/0xd8 [xfs] [ 8282.742856] [<000003ff800f67f4>] xlog_do_recover+0x4c/0x218 [xfs] [ 8282.742965] [<000003ff800f7cfa>] xlog_recover+0xda/0x1a0 [xfs] [ 8282.743073] [<000003ff800dde96>] xfs_log_mount+0x11e/0x280 [xfs] [ 8282.743181] [<000003ff800cfcc6>] xfs_mountfs+0x3e6/0x928 [xfs] [ 8282.743289] [<000003ff800d7554>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] [ 8282.743397] [<000000004e160fec>] get_tree_bdev+0x144/0x1d0 [ 8282.743403] [<000000004e15ea20>] vfs_get_tree+0x38/0x110 [ 8282.743406] [<000000004e19140a>] do_new_mount+0x17a/0x2d0 [ 8282.743411] [<000000004e192084>] path_mount+0x1ac/0x818 [ 8282.743414] [<000000004e1927f4>] __s390x_sys_mount+0x104/0x148 [ 8282.743417] [<000000004e7d5910>] __do_syscall+0x1d0/0x1f8 [ 8282.743420] [<000000004e7e5bc0>] system_call+0x70/0x98 [ 8282.743423] XFS (loop1): Corruption detected. Unmount and run xfs_repair [ 8282.743425] XFS (loop1): Metadata corruption detected at xfs_dinode_verify.part.0+0x2c6/0x580 [xfs], inode 0x83 [ 8282.743582] XFS (loop1): log mount/recovery failed: error -117 [ 8282.743846] XFS (loop1): log mount failed [ 8282.778368] XFS (loop0): Unmounting Filesystem a488de9e-d346-4098-aff4-84e6ca6936c7 [ 8458.453977] XFS (loop1): Mounting V5 Filesystem 890f6068-8eb1-4afa-b1fb-5deb3d937fce [ 8458.467682] XFS (loop1): Starting recovery (logdev: internal) ... ... [ 8611.351331] XFS (loop1): Internal error Bad dinode after recovery at line 536 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x72/0xf0 [xfs] [ 8611.351612] CPU: 1 PID: 10548 Comm: mount Kdump: loaded Not tainted 6.6.0+ #1 [ 8611.351614] Hardware name: IBM 8561 LT1 400 (z/VM 7.2.0) [ 8611.351615] Call Trace: [ 8611.351617] [<000000004e7af67a>] dump_stack_lvl+0x62/0x80 [ 8611.351624] [<000003ff800b1cb0>] xfs_corruption_error+0x70/0xa0 [xfs] [ 8611.351707] [<000003ff800f0032>] xlog_recover_inode_commit_pass2+0x63a/0xb10 [xfs] [ 8611.351792] [<000003ff800f400a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] [ 8611.351886] [<000003ff800f4cc6>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] [ 8611.351989] [<000003ff800f4e28>] xlog_recovery_process_trans+0x140/0x148 [xfs] [ 8611.352073] [<000003ff800f4ef0>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] [ 8611.352157] [<000003ff800f59b6>] xlog_recover_process_data+0xb6/0x168 [xfs] [ 8611.352243] [<000003ff800f5b6c>] xlog_recover_process+0x104/0x150 [xfs] [ 8611.352326] [<000003ff800f5f6a>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] [ 8611.352409] [<000003ff800f6758>] xlog_do_log_recovery+0x88/0xd8 [xfs] [ 8611.352497] [<000003ff800f67f4>] xlog_do_recover+0x4c/0x218 [xfs] [ 8611.352582] [<000003ff800f7cfa>] xlog_recover+0xda/0x1a0 [xfs] [ 8611.352668] [<000003ff800dde96>] xfs_log_mount+0x11e/0x280 [xfs] [ 8611.352767] [<000003ff800cfcc6>] xfs_mountfs+0x3e6/0x928 [xfs] [ 8611.352851] [<000003ff800d7554>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] [ 8611.352935] [<000000004e160fec>] get_tree_bdev+0x144/0x1d0 [ 8611.352941] [<000000004e15ea20>] vfs_get_tree+0x38/0x110 [ 8611.352944] [<000000004e19140a>] do_new_mount+0x17a/0x2d0 [ 8611.352948] [<000000004e192084>] path_mount+0x1ac/0x818 [ 8611.352950] [<000000004e1927f4>] __s390x_sys_mount+0x104/0x148 [ 8611.352953] [<000000004e7d5910>] __do_syscall+0x1d0/0x1f8 [ 8611.352955] [<000000004e7e5bc0>] system_call+0x70/0x98 [ 8611.352958] XFS (loop1): Corruption detected. Unmount and run xfs_repair [ 8611.352960] XFS (loop1): Metadata corruption detected at xfs_dinode_verify.part.0+0x2c6/0x580 [xfs], inode 0x83 [ 8611.353061] XFS (loop1): log mount/recovery failed: error -117 [ 8611.353227] XFS (loop1): log mount failed # trace-cmd record -e xfs\* -e xlog\* -e printk mount /dev/loop1 /mnt/test # trace-cmd report > events-with-dave-patch.txt [2] https://drive.google.com/file/d/1EeWhmhS9yETegPM4oPcAe5Nuj5fUBu3V/view?usp=sharing > > -Dave. > -- > Dave Chinner > david@fromorbit.com > > xfs: inode recovery does not validate the recovered inode > > From: Dave Chinner <dchinner@redhat.com> > > Discovered when trying to track down a weird recovery corruption > issue that wasn't detected at recovery time. > > The specific corruption was a zero extent count field when big > extent counts are in use, and it turns out the dinode verifier > doesn't detect that specific corruption case, either. So fix it too. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/libxfs/xfs_inode_buf.c | 3 +++ > fs/xfs/xfs_inode_item_recover.c | 14 +++++++++++++- > 2 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index a35781577cad..0f970a0b3382 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -508,6 +508,9 @@ xfs_dinode_verify( > if (mode && nextents + naextents > nblocks) > return __this_address; > > + if (nextents + naextents == 0 && nblocks != 0) > + return __this_address; > + > if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents) > return __this_address; > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > index 6b09e2bf2d74..f4c31c2b60d5 100644 > --- a/fs/xfs/xfs_inode_item_recover.c > +++ b/fs/xfs/xfs_inode_item_recover.c > @@ -286,6 +286,7 @@ xlog_recover_inode_commit_pass2( > struct xfs_log_dinode *ldip; > uint isize; > int need_free = 0; > + xfs_failaddr_t fa; > > if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) { > in_f = item->ri_buf[0].i_addr; > @@ -529,8 +530,19 @@ xlog_recover_inode_commit_pass2( > (dip->di_mode != 0)) > error = xfs_recover_inode_owner_change(mp, dip, in_f, > buffer_list); > - /* re-generate the checksum. */ > + /* re-generate the checksum and validate the recovered inode. */ > xfs_dinode_calc_crc(log->l_mp, dip); > + fa = xfs_dinode_verify(log->l_mp, in_f->ilf_ino, dip); > + if (fa) { > + XFS_CORRUPTION_ERROR( > + "Bad dinode after recovery", > + XFS_ERRLEVEL_LOW, mp, dip, sizeof(*dip)); > + xfs_alert(mp, > + "Metadata corruption detected at %pS, inode 0x%llx", > + fa, in_f->ilf_ino); > + error = -EFSCORRUPTED; > + goto out_release; > + } > > ASSERT(bp->b_mount == mp); > bp->b_flags |= _XBF_LOGRECOVERY; > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-09 14:09 ` Zorro Lang @ 2023-11-09 23:13 ` Dave Chinner 2023-11-10 1:36 ` Zorro Lang 0 siblings, 1 reply; 19+ messages in thread From: Dave Chinner @ 2023-11-09 23:13 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Thu, Nov 09, 2023 at 10:09:29PM +0800, Zorro Lang wrote: > On Thu, Nov 09, 2023 at 05:14:51PM +1100, Dave Chinner wrote: > > On Thu, Nov 09, 2023 at 10:43:58AM +0800, Zirong Lang wrote: > > > By changing the generic/047 as below, I got 2 dump files and 2 log files. > > > Please check the attachment, > > > and feel free to tell me if you need more. > > > > Well, we've narrowed down to some weird recovery issue - the journal > > is intact, recovery recovers the inode from the correct log item in > > the journal, but the inode written to disk by recovery is corrupt. > > > > What this points out is that we don't actually verify the inode we > > recover is valid before writeback is queued, nor do we detect the > > specific corruption being encountered in the verifier (i.e. non-zero > > nblocks count when extent count is zero). > > > > Can you add the patch below and see if/when the verifier fires on > > the reproducer? I'm particularly interested to know where it fires - > > in recovery before writeback, or when the root inode is re-read from > > disk. It doesn't fire on x64-64.... > > Hi Dave, > > I just re-tested with your patch, the steps as [1]. The trace-cmd output > can be downloaded from [2]. > > Please tell me if I misunderstood anything. You got it right, that's exactly what I wanted to see. :) .... > [ 8272.820617] run fstests generic/047 at 2023-11-09 08:36:39 > [ 8273.231657] XFS (loop1): Mounting V5 Filesystem 9faf1858-4d74-4b58-b0e7-68fd1d90762a > [ 8273.233769] XFS (loop1): Ending clean mount > [ 8273.235510] XFS (loop1): User initiated shutdown received. > [ 8273.235520] XFS (loop1): Metadata I/O Error (0x4) detected at xfs_fs_goingdown+0x5a/0xf8 [xfs] (fs/xfs/xfs_fsops.c:492). Shutting down filesystem. > [ 8273.235772] XFS (loop1): Please unmount the filesystem and rectify the problem(s) > [ 8273.236810] XFS (loop1): Unmounting Filesystem 9faf1858-4d74-4b58-b0e7-68fd1d90762a > [ 8273.284005] XFS (loop1): Mounting V5 Filesystem 890f6068-8eb1-4afa-b1fb-5deb3d937fce > [ 8273.285668] XFS (loop1): Ending clean mount > [ 8275.789831] XFS (loop1): User initiated shutdown received. > [ 8275.789846] XFS (loop1): Log I/O Error (0x6) detected at xfs_fs_goingdown+0xb4/0xf8 [xfs] (fs/xfs/xfs_fsops.c:495). Shutting down filesystem. > [ 8275.790170] XFS (loop1): Please unmount the filesystem and rectify the problem(s) > [ 8275.795171] XFS (loop1): Unmounting Filesystem 890f6068-8eb1-4afa-b1fb-5deb3d937fce > [ 8282.714748] XFS (loop1): Mounting V5 Filesystem 890f6068-8eb1-4afa-b1fb-5deb3d937fce > [ 8282.728494] XFS (loop1): Starting recovery (logdev: internal) > [ 8282.741350] 00000000: 49 4e 81 80 03 02 00 00 00 00 00 00 00 00 00 00 IN.............. > [ 8282.741357] 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [ 8282.741360] 00000020: 35 63 5c e5 7f d2 65 f2 35 63 5c e5 7f d2 65 f2 5c\...e.5c\...e. > [ 8282.741363] 00000030: 35 63 5c e5 7f d2 65 f2 00 00 00 00 00 00 80 00 5c\...e......... > [ 8282.741366] 00000040: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 ................ > [ 8282.741368] 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5d b8 45 2b ............].E+ > [ 8282.741371] 00000060: ff ff ff ff 77 bb a7 2f 00 00 00 00 00 00 00 04 ....w../........ > [ 8282.741375] 00000070: 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 18 ................ > [ 8282.741378] 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [ 8282.741381] 00000090: 35 63 5c e5 7f d2 65 f2 00 00 00 00 00 00 00 83 5c\...e......... > [ 8282.741397] 000000a0: 89 0f 60 68 8e b1 4a fa b1 fb 5d eb 3d 93 7f ce ..`h..J...].=... Ok, so that's the inode core that was recovered and it's in memory before being written to disk. It's clearly not been recovered correctly - as it's triggered the new extent vs block count check I added to the verifier. > [ 8282.741399] XFS (loop1): Internal error Bad dinode after recovery at line 536 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x72/0xf0 [xfs] > [ 8282.741739] CPU: 1 PID: 7645 Comm: mount Kdump: loaded Not tainted 6.6.0+ #1 > [ 8282.741743] Hardware name: IBM 8561 LT1 400 (z/VM 7.2.0) > [ 8282.741746] Call Trace: > [ 8282.741748] [<000000004e7af67a>] dump_stack_lvl+0x62/0x80 > [ 8282.741756] [<000003ff800b1cb0>] xfs_corruption_error+0x70/0xa0 [xfs] > [ 8282.741863] [<000003ff800f0032>] xlog_recover_inode_commit_pass2+0x63a/0xb10 [xfs] > [ 8282.741973] [<000003ff800f400a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] > [ 8282.742082] [<000003ff800f4cc6>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] > [ 8282.742192] [<000003ff800f4e28>] xlog_recovery_process_trans+0x140/0x148 [xfs] > [ 8282.742302] [<000003ff800f4ef0>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] > [ 8282.742413] [<000003ff800f59b6>] xlog_recover_process_data+0xb6/0x168 [xfs] > [ 8282.742528] [<000003ff800f5b6c>] xlog_recover_process+0x104/0x150 [xfs] > [ 8282.742638] [<000003ff800f5f6a>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] > [ 8282.742747] [<000003ff800f6758>] xlog_do_log_recovery+0x88/0xd8 [xfs] > [ 8282.742856] [<000003ff800f67f4>] xlog_do_recover+0x4c/0x218 [xfs] > [ 8282.742965] [<000003ff800f7cfa>] xlog_recover+0xda/0x1a0 [xfs] > [ 8282.743073] [<000003ff800dde96>] xfs_log_mount+0x11e/0x280 [xfs] > [ 8282.743181] [<000003ff800cfcc6>] xfs_mountfs+0x3e6/0x928 [xfs] > [ 8282.743289] [<000003ff800d7554>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] > [ 8282.743397] [<000000004e160fec>] get_tree_bdev+0x144/0x1d0 > [ 8282.743403] [<000000004e15ea20>] vfs_get_tree+0x38/0x110 > [ 8282.743406] [<000000004e19140a>] do_new_mount+0x17a/0x2d0 > [ 8282.743411] [<000000004e192084>] path_mount+0x1ac/0x818 > [ 8282.743414] [<000000004e1927f4>] __s390x_sys_mount+0x104/0x148 > [ 8282.743417] [<000000004e7d5910>] __do_syscall+0x1d0/0x1f8 > [ 8282.743420] [<000000004e7e5bc0>] system_call+0x70/0x98 > [ 8282.743423] XFS (loop1): Corruption detected. Unmount and run xfs_repair > [ 8282.743425] XFS (loop1): Metadata corruption detected at xfs_dinode_verify.part.0+0x2c6/0x580 [xfs], inode 0x83 Ok inode 0x83 - that's not the root inode, so slightly unexpected. The trace, however, tells me that this is the first inode recovered from the checkpoint (the root inode is the other in the checkpoint) and we knew that the regular file inodes were bad, too. Essentially, the inode dump tells use that everything but the extent counts were recovered correctly from the inode log item. I guess we now need to do specific trace_printk debugging to find out what, exactly, is being processed incorrectly on s390. Can you add the debug patch below on top of the corruption detection patch? If it fails, run the same trace-cmd on the failed mount as you've done here? If it does not fail, it would still be good to get a trace of the mount that performs recovery successfully. To do that you'll have to modify the test to trace the specific mount rather than do it separately after the test fails... Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: s390 inode recovery debug --- fs/xfs/xfs_inode_item_recover.c | 44 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index f4c31c2b60d5..2a0166702192 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -165,6 +165,41 @@ xfs_log_dinode_to_disk_iext_counters( } +static void dump_inodes( + struct xfs_log_dinode *ldip, + struct xfs_dinode *dip, + char *ctx) +{ + smp_mb(); + + trace_printk( +"disk ino 0x%llx: %s nblocks 0x%llx nextents 0x%x/0x%llx anextents 0x%x/0x%llx v3_pad 0x%llx nrext64_pad 0x%x di_flags2 0x%llx", + be64_to_cpu(dip->di_ino), + ctx, + be64_to_cpu(dip->di_nblocks), + be32_to_cpu(dip->di_nextents), + be64_to_cpu(dip->di_big_nextents), + be32_to_cpu(dip->di_anextents), + be64_to_cpu(dip->di_big_anextents), + be64_to_cpu(dip->di_v3_pad), + be16_to_cpu(dip->di_nrext64_pad), + be64_to_cpu(dip->di_flags2)); + trace_printk( +"log ino 0x%llx: %s nblocks 0x%llx nextents 0x%x/0x%llx anextents 0x%x/0x%x v3_pad 0x%llx nrext64_pad 0x%x di_flags2 0x%llx %s", + ldip->di_ino, + ctx, + ldip->di_nblocks, + ldip->di_nextents, + ldip->di_big_nextents, + ldip->di_anextents, + ldip->di_big_anextents, + ldip->di_v3_pad, + ldip->di_nrext64_pad, + ldip->di_flags2, + xfs_log_dinode_has_large_extent_counts(ldip) ? "big" : "small"); + smp_mb(); +} + STATIC void xfs_log_dinode_to_disk( struct xfs_log_dinode *from, @@ -196,6 +231,8 @@ xfs_log_dinode_to_disk( to->di_flags = cpu_to_be16(from->di_flags); to->di_gen = cpu_to_be32(from->di_gen); + dump_inodes(from, to, "before v3"); + if (from->di_version == 3) { to->di_changecount = cpu_to_be64(from->di_changecount); to->di_crtime = xfs_log_dinode_to_disk_ts(from, @@ -212,6 +249,8 @@ xfs_log_dinode_to_disk( memset(to->di_v2_pad, 0, sizeof(to->di_v2_pad)); } + dump_inodes(from, to, "before iext"); + xfs_log_dinode_to_disk_iext_counters(from, to); } @@ -337,6 +376,7 @@ xlog_recover_inode_commit_pass2( goto out_release; } + dump_inodes(ldip, dip, "init"); /* * If the inode has an LSN in it, recover the inode only if the on-disk * inode's LSN is older than the lsn of the transaction we are @@ -441,6 +481,7 @@ xlog_recover_inode_commit_pass2( goto out_release; } + dump_inodes(ldip, dip, "pre"); /* * Recover the log dinode inode into the on disk inode. * @@ -453,6 +494,8 @@ xlog_recover_inode_commit_pass2( */ xfs_log_dinode_to_disk(ldip, dip, current_lsn); + dump_inodes(ldip, dip, "post"); + fields = in_f->ilf_fields; if (fields & XFS_ILOG_DEV) xfs_dinode_put_rdev(dip, in_f->ilf_u.ilfu_rdev); @@ -530,6 +573,7 @@ xlog_recover_inode_commit_pass2( (dip->di_mode != 0)) error = xfs_recover_inode_owner_change(mp, dip, in_f, buffer_list); + dump_inodes(ldip, dip, "done"); /* re-generate the checksum and validate the recovered inode. */ xfs_dinode_calc_crc(log->l_mp, dip); fa = xfs_dinode_verify(log->l_mp, in_f->ilf_ino, dip); ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-09 23:13 ` Dave Chinner @ 2023-11-10 1:36 ` Zorro Lang 2023-11-10 2:03 ` Dave Chinner 0 siblings, 1 reply; 19+ messages in thread From: Zorro Lang @ 2023-11-10 1:36 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Fri, Nov 10, 2023 at 10:13:26AM +1100, Dave Chinner wrote: > On Thu, Nov 09, 2023 at 10:09:29PM +0800, Zorro Lang wrote: > > On Thu, Nov 09, 2023 at 05:14:51PM +1100, Dave Chinner wrote: > > > On Thu, Nov 09, 2023 at 10:43:58AM +0800, Zirong Lang wrote: > > > > By changing the generic/047 as below, I got 2 dump files and 2 log files. > > > > Please check the attachment, > > > > and feel free to tell me if you need more. > > > > > > Well, we've narrowed down to some weird recovery issue - the journal > > > is intact, recovery recovers the inode from the correct log item in > > > the journal, but the inode written to disk by recovery is corrupt. > > > > > > What this points out is that we don't actually verify the inode we > > > recover is valid before writeback is queued, nor do we detect the > > > specific corruption being encountered in the verifier (i.e. non-zero > > > nblocks count when extent count is zero). > > > > > > Can you add the patch below and see if/when the verifier fires on > > > the reproducer? I'm particularly interested to know where it fires - > > > in recovery before writeback, or when the root inode is re-read from > > > disk. It doesn't fire on x64-64.... > > > > Hi Dave, > > > > I just re-tested with your patch, the steps as [1]. The trace-cmd output > > can be downloaded from [2]. > > > > Please tell me if I misunderstood anything. > > You got it right, that's exactly what I wanted to see. :) > > .... > > [ 8272.820617] run fstests generic/047 at 2023-11-09 08:36:39 > > [ 8273.231657] XFS (loop1): Mounting V5 Filesystem 9faf1858-4d74-4b58-b0e7-68fd1d90762a > > [ 8273.233769] XFS (loop1): Ending clean mount > > [ 8273.235510] XFS (loop1): User initiated shutdown received. > > [ 8273.235520] XFS (loop1): Metadata I/O Error (0x4) detected at xfs_fs_goingdown+0x5a/0xf8 [xfs] (fs/xfs/xfs_fsops.c:492). Shutting down filesystem. > > [ 8273.235772] XFS (loop1): Please unmount the filesystem and rectify the problem(s) > > [ 8273.236810] XFS (loop1): Unmounting Filesystem 9faf1858-4d74-4b58-b0e7-68fd1d90762a > > [ 8273.284005] XFS (loop1): Mounting V5 Filesystem 890f6068-8eb1-4afa-b1fb-5deb3d937fce > > [ 8273.285668] XFS (loop1): Ending clean mount > > [ 8275.789831] XFS (loop1): User initiated shutdown received. > > [ 8275.789846] XFS (loop1): Log I/O Error (0x6) detected at xfs_fs_goingdown+0xb4/0xf8 [xfs] (fs/xfs/xfs_fsops.c:495). Shutting down filesystem. > > [ 8275.790170] XFS (loop1): Please unmount the filesystem and rectify the problem(s) > > [ 8275.795171] XFS (loop1): Unmounting Filesystem 890f6068-8eb1-4afa-b1fb-5deb3d937fce > > [ 8282.714748] XFS (loop1): Mounting V5 Filesystem 890f6068-8eb1-4afa-b1fb-5deb3d937fce > > [ 8282.728494] XFS (loop1): Starting recovery (logdev: internal) > > [ 8282.741350] 00000000: 49 4e 81 80 03 02 00 00 00 00 00 00 00 00 00 00 IN.............. > > [ 8282.741357] 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > [ 8282.741360] 00000020: 35 63 5c e5 7f d2 65 f2 35 63 5c e5 7f d2 65 f2 5c\...e.5c\...e. > > [ 8282.741363] 00000030: 35 63 5c e5 7f d2 65 f2 00 00 00 00 00 00 80 00 5c\...e......... > > [ 8282.741366] 00000040: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 ................ > > [ 8282.741368] 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5d b8 45 2b ............].E+ > > [ 8282.741371] 00000060: ff ff ff ff 77 bb a7 2f 00 00 00 00 00 00 00 04 ....w../........ > > [ 8282.741375] 00000070: 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 18 ................ > > [ 8282.741378] 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > [ 8282.741381] 00000090: 35 63 5c e5 7f d2 65 f2 00 00 00 00 00 00 00 83 5c\...e......... > > [ 8282.741397] 000000a0: 89 0f 60 68 8e b1 4a fa b1 fb 5d eb 3d 93 7f ce ..`h..J...].=... > > Ok, so that's the inode core that was recovered and it's in memory > before being written to disk. It's clearly not been recovered > correctly - as it's triggered the new extent vs block count check > I added to the verifier. > > > [ 8282.741399] XFS (loop1): Internal error Bad dinode after recovery at line 536 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x72/0xf0 [xfs] > > [ 8282.741739] CPU: 1 PID: 7645 Comm: mount Kdump: loaded Not tainted 6.6.0+ #1 > > [ 8282.741743] Hardware name: IBM 8561 LT1 400 (z/VM 7.2.0) > > [ 8282.741746] Call Trace: > > [ 8282.741748] [<000000004e7af67a>] dump_stack_lvl+0x62/0x80 > > [ 8282.741756] [<000003ff800b1cb0>] xfs_corruption_error+0x70/0xa0 [xfs] > > [ 8282.741863] [<000003ff800f0032>] xlog_recover_inode_commit_pass2+0x63a/0xb10 [xfs] > > [ 8282.741973] [<000003ff800f400a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] > > [ 8282.742082] [<000003ff800f4cc6>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] > > [ 8282.742192] [<000003ff800f4e28>] xlog_recovery_process_trans+0x140/0x148 [xfs] > > [ 8282.742302] [<000003ff800f4ef0>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] > > [ 8282.742413] [<000003ff800f59b6>] xlog_recover_process_data+0xb6/0x168 [xfs] > > [ 8282.742528] [<000003ff800f5b6c>] xlog_recover_process+0x104/0x150 [xfs] > > [ 8282.742638] [<000003ff800f5f6a>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] > > [ 8282.742747] [<000003ff800f6758>] xlog_do_log_recovery+0x88/0xd8 [xfs] > > [ 8282.742856] [<000003ff800f67f4>] xlog_do_recover+0x4c/0x218 [xfs] > > [ 8282.742965] [<000003ff800f7cfa>] xlog_recover+0xda/0x1a0 [xfs] > > [ 8282.743073] [<000003ff800dde96>] xfs_log_mount+0x11e/0x280 [xfs] > > [ 8282.743181] [<000003ff800cfcc6>] xfs_mountfs+0x3e6/0x928 [xfs] > > [ 8282.743289] [<000003ff800d7554>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] > > [ 8282.743397] [<000000004e160fec>] get_tree_bdev+0x144/0x1d0 > > [ 8282.743403] [<000000004e15ea20>] vfs_get_tree+0x38/0x110 > > [ 8282.743406] [<000000004e19140a>] do_new_mount+0x17a/0x2d0 > > [ 8282.743411] [<000000004e192084>] path_mount+0x1ac/0x818 > > [ 8282.743414] [<000000004e1927f4>] __s390x_sys_mount+0x104/0x148 > > [ 8282.743417] [<000000004e7d5910>] __do_syscall+0x1d0/0x1f8 > > [ 8282.743420] [<000000004e7e5bc0>] system_call+0x70/0x98 > > [ 8282.743423] XFS (loop1): Corruption detected. Unmount and run xfs_repair > > [ 8282.743425] XFS (loop1): Metadata corruption detected at xfs_dinode_verify.part.0+0x2c6/0x580 [xfs], inode 0x83 > > Ok inode 0x83 - that's not the root inode, so slightly unexpected. > The trace, however, tells me that this is the first inode recovered > from the checkpoint (the root inode is the other in the checkpoint) > and we knew that the regular file inodes were bad, too. > > Essentially, the inode dump tells use that everything but the extent > counts were recovered correctly from the inode log item. I guess we > now need to do specific trace_printk debugging to find out what, > exactly, is being processed incorrectly on s390. > > Can you add the debug patch below on top of the corruption detection > patch? If it fails, run the same trace-cmd on the failed mount as > you've done here? > > If it does not fail, it would still be good to get a trace of the > mount that performs recovery successfully. To do that you'll > have to modify the test to trace the specific mount rather than do > it separately after the test fails... The g/047 still fails with this 2nd patch. So I did below steps [1], and get the trace output as [2], those dump_inodes() messages you added have been printed, please check. Thanks, Zorro [1] # ./check generic/047 FSTYP -- xfs (debug) PLATFORM -- Linux/s390x ibm-z-507 6.6.0+ #1 SMP Wed Nov 8 12:51:20 EST 2023 MKFS_OPTIONS -- -f -m crc=1,finobt=1,rmapbt=0,reflink=1,bigtime=1,inobtcount=1 /dev/loop1 MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/fstests/SCRATCH_DIR generic/047 [failed, exit status 1]_check_dmesg: something found in dmesg (see /var/lib/xfstests/results//generic/047.dmesg) - output mismatch (see /var/lib/xfstests/results//generic/047.out.bad) --- tests/generic/047.out 2023-11-08 13:02:38.424036829 -0500 +++ /var/lib/xfstests/results//generic/047.out.bad 2023-11-09 20:19:37.313375849 -0500 @@ -1 +1,4 @@ QA output created by 047 +mount: /mnt/fstests/SCRATCH_DIR: mount(2) system call failed: Structure needs cleaning. +mount -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/fstests/SCRATCH_DIR failed +(see /var/lib/xfstests/results//generic/047.full for details) ... (Run 'diff -u /var/lib/xfstests/tests/generic/047.out /var/lib/xfstests/results//generic/047.out.bad' to see the entire diff) Ran: generic/047 Failures: generic/047 Failed 1 of 1 tests # mount /dev/loop1 /mnt/fstests/SCRATCH_DIR mount: /mnt/fstests/SCRATCH_DIR: mount(2) system call failed: Structure needs cleaning. # dmesg [ 220.067517] run fstests generic/047 at 2023-11-09 20:19:33 [ 220.509393] XFS (loop1): Mounting V5 Filesystem 83788739-5d56-4018-a6ae-f808f4d9ac0b [ 220.511726] XFS (loop1): Ending clean mount [ 220.513584] XFS (loop1): User initiated shutdown received. [ 220.513591] XFS (loop1): Metadata I/O Error (0x4) detected at xfs_fs_goingdown+0x5a/0xf8 [xfs] (fs/xfs/xfs_fsops.c:492). Shutting down filesystem. [ 220.513909] XFS (loop1): Please unmount the filesystem and rectify the problem(s) [ 220.515071] XFS (loop1): Unmounting Filesystem 83788739-5d56-4018-a6ae-f808f4d9ac0b [ 220.566039] XFS (loop1): Mounting V5 Filesystem d53516b1-cb48-4c9b-b7bf-0d522a3ed6a0 [ 220.568044] XFS (loop1): Ending clean mount [ 223.389303] XFS (loop1): User initiated shutdown received. [ 223.389320] XFS (loop1): Log I/O Error (0x6) detected at xfs_fs_goingdown+0xb4/0xf8 [xfs] (fs/xfs/xfs_fsops.c:495). Shutting down filesystem. [ 223.389661] XFS (loop1): Please unmount the filesystem and rectify the problem(s) [ 223.396485] XFS (loop1): Unmounting Filesystem d53516b1-cb48-4c9b-b7bf-0d522a3ed6a0 [ 223.405668] XFS (loop1): Mounting V5 Filesystem d53516b1-cb48-4c9b-b7bf-0d522a3ed6a0 [ 223.422333] XFS (loop1): Starting recovery (logdev: internal) [ 223.436377] 00000000: 49 4e 81 80 03 02 00 00 00 00 00 00 00 00 00 00 IN.............. [ 223.436387] 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 223.436390] 00000020: 35 63 83 41 18 01 21 e9 35 63 83 41 18 01 21 e9 5c.A..!.5c.A..!. [ 223.436394] 00000030: 35 63 83 41 18 01 21 e9 00 00 00 00 00 00 80 00 5c.A..!......... [ 223.436397] 00000040: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 ................ [ 223.436400] 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5a be 98 f1 ............Z... [ 223.436417] 00000060: ff ff ff ff 5e c5 ed 0a 00 00 00 00 00 00 00 04 ....^........... [ 223.436418] 00000070: 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 18 ................ [ 223.436420] 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 223.436422] 00000090: 35 63 83 41 18 01 21 e9 00 00 00 00 00 00 00 83 5c.A..!......... [ 223.436424] 000000a0: d5 35 16 b1 cb 48 4c 9b b7 bf 0d 52 2a 3e d6 a0 .5...HL....R*>.. [ 223.436426] XFS (loop1): Internal error Bad dinode after recovery at line 580 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x72/0xf0 [xfs] [ 223.436696] CPU: 1 PID: 7579 Comm: mount Kdump: loaded Not tainted 6.6.0+ #1 [ 223.436700] Hardware name: IBM 8561 LT1 400 (z/VM 7.2.0) [ 223.436701] Call Trace: [ 223.436703] [<00000000e1f5f67a>] dump_stack_lvl+0x62/0x80 [ 223.436711] [<000003ff7ff8bcd8>] xfs_corruption_error+0x70/0xa0 [xfs] [ 223.436800] [<000003ff7ffca1f4>] xlog_recover_inode_commit_pass2+0x674/0xb40 [xfs] [ 223.436894] [<000003ff7ffce1c2>] xlog_recover_items_pass2+0x72/0xf0 [xfs] [ 223.436985] [<000003ff7ffcee7e>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] [ 223.437067] [<000003ff7ffcefe0>] xlog_recovery_process_trans+0x140/0x148 [xfs] [ 223.437164] [<000003ff7ffcf0a8>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] [ 223.437272] [<000003ff7ffcfb6e>] xlog_recover_process_data+0xb6/0x168 [xfs] [ 223.437381] [<000003ff7ffcfd24>] xlog_recover_process+0x104/0x150 [xfs] [ 223.437490] [<000003ff7ffd0122>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] [ 223.437594] [<000003ff7ffd0910>] xlog_do_log_recovery+0x88/0xd8 [xfs] [ 223.437697] [<000003ff7ffd09ac>] xlog_do_recover+0x4c/0x218 [xfs] [ 223.437799] [<000003ff7ffd1eb2>] xlog_recover+0xda/0x1a0 [xfs] [ 223.437905] [<000003ff7ffb7ebe>] xfs_log_mount+0x11e/0x280 [xfs] [ 223.438013] [<000003ff7ffa9cee>] xfs_mountfs+0x3e6/0x928 [xfs] [ 223.438120] [<000003ff7ffb157c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] [ 223.438228] [<00000000e1910fec>] get_tree_bdev+0x144/0x1d0 [ 223.438233] [<00000000e190ea20>] vfs_get_tree+0x38/0x110 [ 223.438236] [<00000000e194140a>] do_new_mount+0x17a/0x2d0 [ 223.438240] [<00000000e1942084>] path_mount+0x1ac/0x818 [ 223.438243] [<00000000e19427f4>] __s390x_sys_mount+0x104/0x148 [ 223.438246] [<00000000e1f85910>] __do_syscall+0x1d0/0x1f8 [ 223.438249] [<00000000e1f95bc0>] system_call+0x70/0x98 [ 223.438252] XFS (loop1): Corruption detected. Unmount and run xfs_repair [ 223.438255] XFS (loop1): Metadata corruption detected at xfs_dinode_verify.part.0+0x2c6/0x580 [xfs], inode 0x83 [ 223.438415] XFS (loop1): log mount/recovery failed: error -117 [ 223.438903] XFS (loop1): log mount failed [ 223.486754] XFS (loop0): Unmounting Filesystem a488de9e-d346-4098-aff4-84e6ca6936c7 [ 345.343101] XFS (loop1): Mounting V5 Filesystem d53516b1-cb48-4c9b-b7bf-0d522a3ed6a0 [ 345.354273] XFS (loop1): Starting recovery (logdev: internal) [ 345.362837] 00000000: 49 4e 81 80 03 02 00 00 00 00 00 00 00 00 00 00 IN.............. [ 345.362840] 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 345.362842] 00000020: 35 63 83 41 18 01 21 e9 35 63 83 41 18 01 21 e9 5c.A..!.5c.A..!. [ 345.362845] 00000030: 35 63 83 41 18 01 21 e9 00 00 00 00 00 00 80 00 5c.A..!......... [ 345.362847] 00000040: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 ................ [ 345.362849] 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5a be 98 f1 ............Z... [ 345.362851] 00000060: ff ff ff ff 5e c5 ed 0a 00 00 00 00 00 00 00 04 ....^........... [ 345.362853] 00000070: 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 18 ................ [ 345.362855] 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 345.362858] 00000090: 35 63 83 41 18 01 21 e9 00 00 00 00 00 00 00 83 5c.A..!......... [ 345.362919] 000000a0: d5 35 16 b1 cb 48 4c 9b b7 bf 0d 52 2a 3e d6 a0 .5...HL....R*>.. [ 345.362928] XFS (loop1): Internal error Bad dinode after recovery at line 580 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x72/0xf0 [xfs] ... ... [ 345.362928] XFS (loop1): Internal error Bad dinode after recovery at line 580 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x72/0xf0 [xfs] [ 345.363185] CPU: 0 PID: 7657 Comm: mount Kdump: loaded Not tainted 6.6.0+ #1 [ 345.363188] Hardware name: IBM 8561 LT1 400 (z/VM 7.2.0) [ 345.363190] Call Trace: [ 345.363192] [<00000000e1f5f67a>] dump_stack_lvl+0x62/0x80 [ 345.363199] [<000003ff7ff8bcd8>] xfs_corruption_error+0x70/0xa0 [xfs] [ 345.363305] [<000003ff7ffca1f4>] xlog_recover_inode_commit_pass2+0x674/0xb40 [xfs] [ 345.363387] [<000003ff7ffce1c2>] xlog_recover_items_pass2+0x72/0xf0 [xfs] [ 345.363469] [<000003ff7ffcee7e>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] [ 345.363550] [<000003ff7ffcefe0>] xlog_recovery_process_trans+0x140/0x148 [xfs] [ 345.363631] [<000003ff7ffcf0a8>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] [ 345.363713] [<000003ff7ffcfb6e>] xlog_recover_process_data+0xb6/0x168 [xfs] [ 345.363824] [<000003ff7ffcfd24>] xlog_recover_process+0x104/0x150 [xfs] [ 345.363906] [<000003ff7ffd0122>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] [ 345.363987] [<000003ff7ffd0910>] xlog_do_log_recovery+0x88/0xd8 [xfs] [ 345.364067] [<000003ff7ffd09ac>] xlog_do_recover+0x4c/0x218 [xfs] [ 345.364146] [<000003ff7ffd1eb2>] xlog_recover+0xda/0x1a0 [xfs] [ 345.364228] [<000003ff7ffb7ebe>] xfs_log_mount+0x11e/0x280 [xfs] [ 345.364307] [<000003ff7ffa9cee>] xfs_mountfs+0x3e6/0x928 [xfs] [ 345.364386] [<000003ff7ffb157c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] [ 345.364465] [<00000000e1910fec>] get_tree_bdev+0x144/0x1d0 [ 345.364470] [<00000000e190ea20>] vfs_get_tree+0x38/0x110 [ 345.364472] [<00000000e194140a>] do_new_mount+0x17a/0x2d0 [ 345.364476] [<00000000e1942084>] path_mount+0x1ac/0x818 [ 345.364478] [<00000000e19427f4>] __s390x_sys_mount+0x104/0x148 [ 345.364480] [<00000000e1f85910>] __do_syscall+0x1d0/0x1f8 [ 345.364483] [<00000000e1f95bc0>] system_call+0x70/0x98 [ 345.364485] XFS (loop1): Corruption detected. Unmount and run xfs_repair [ 345.364486] XFS (loop1): Metadata corruption detected at xfs_dinode_verify.part.0+0x2c6/0x580 [xfs], inode 0x83 [ 345.364596] XFS (loop1): log mount/recovery failed: error -117 [ 345.364877] XFS (loop1): log mount failed # trace-cmd record -e xfs\* -e xlog\* -e printk mount /dev/loop1 /mnt/test mount: /mnt/test: mount(2) system call failed: Structure needs cleaning. CPU0 data recorded at offset=0x634000 6385664 bytes in size CPU1 data recorded at offset=0xc4b000 0 bytes in size # trace-cmd report > events-with-dave-patch2.txt # bzip2 events-with-dave-patch2.txt [2] https://drive.google.com/file/d/1Ut1EUmlkxqxuyIhrLcLfhwEhLfeCABs4/view?usp=sharing > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > > xfs: s390 inode recovery debug > > --- > fs/xfs/xfs_inode_item_recover.c | 44 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > index f4c31c2b60d5..2a0166702192 100644 > --- a/fs/xfs/xfs_inode_item_recover.c > +++ b/fs/xfs/xfs_inode_item_recover.c > @@ -165,6 +165,41 @@ xfs_log_dinode_to_disk_iext_counters( > > } > > +static void dump_inodes( > + struct xfs_log_dinode *ldip, > + struct xfs_dinode *dip, > + char *ctx) > +{ > + smp_mb(); > + > + trace_printk( > +"disk ino 0x%llx: %s nblocks 0x%llx nextents 0x%x/0x%llx anextents 0x%x/0x%llx v3_pad 0x%llx nrext64_pad 0x%x di_flags2 0x%llx", > + be64_to_cpu(dip->di_ino), > + ctx, > + be64_to_cpu(dip->di_nblocks), > + be32_to_cpu(dip->di_nextents), > + be64_to_cpu(dip->di_big_nextents), > + be32_to_cpu(dip->di_anextents), > + be64_to_cpu(dip->di_big_anextents), > + be64_to_cpu(dip->di_v3_pad), > + be16_to_cpu(dip->di_nrext64_pad), > + be64_to_cpu(dip->di_flags2)); > + trace_printk( > +"log ino 0x%llx: %s nblocks 0x%llx nextents 0x%x/0x%llx anextents 0x%x/0x%x v3_pad 0x%llx nrext64_pad 0x%x di_flags2 0x%llx %s", > + ldip->di_ino, > + ctx, > + ldip->di_nblocks, > + ldip->di_nextents, > + ldip->di_big_nextents, > + ldip->di_anextents, > + ldip->di_big_anextents, > + ldip->di_v3_pad, > + ldip->di_nrext64_pad, > + ldip->di_flags2, > + xfs_log_dinode_has_large_extent_counts(ldip) ? "big" : "small"); > + smp_mb(); > +} > + > STATIC void > xfs_log_dinode_to_disk( > struct xfs_log_dinode *from, > @@ -196,6 +231,8 @@ xfs_log_dinode_to_disk( > to->di_flags = cpu_to_be16(from->di_flags); > to->di_gen = cpu_to_be32(from->di_gen); > > + dump_inodes(from, to, "before v3"); > + > if (from->di_version == 3) { > to->di_changecount = cpu_to_be64(from->di_changecount); > to->di_crtime = xfs_log_dinode_to_disk_ts(from, > @@ -212,6 +249,8 @@ xfs_log_dinode_to_disk( > memset(to->di_v2_pad, 0, sizeof(to->di_v2_pad)); > } > > + dump_inodes(from, to, "before iext"); > + > xfs_log_dinode_to_disk_iext_counters(from, to); > } > > @@ -337,6 +376,7 @@ xlog_recover_inode_commit_pass2( > goto out_release; > } > > + dump_inodes(ldip, dip, "init"); > /* > * If the inode has an LSN in it, recover the inode only if the on-disk > * inode's LSN is older than the lsn of the transaction we are > @@ -441,6 +481,7 @@ xlog_recover_inode_commit_pass2( > goto out_release; > } > > + dump_inodes(ldip, dip, "pre"); > /* > * Recover the log dinode inode into the on disk inode. > * > @@ -453,6 +494,8 @@ xlog_recover_inode_commit_pass2( > */ > xfs_log_dinode_to_disk(ldip, dip, current_lsn); > > + dump_inodes(ldip, dip, "post"); > + > fields = in_f->ilf_fields; > if (fields & XFS_ILOG_DEV) > xfs_dinode_put_rdev(dip, in_f->ilf_u.ilfu_rdev); > @@ -530,6 +573,7 @@ xlog_recover_inode_commit_pass2( > (dip->di_mode != 0)) > error = xfs_recover_inode_owner_change(mp, dip, in_f, > buffer_list); > + dump_inodes(ldip, dip, "done"); > /* re-generate the checksum and validate the recovered inode. */ > xfs_dinode_calc_crc(log->l_mp, dip); > fa = xfs_dinode_verify(log->l_mp, in_f->ilf_ino, dip); > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-10 1:36 ` Zorro Lang @ 2023-11-10 2:03 ` Dave Chinner 2023-11-10 4:32 ` Darrick J. Wong ` (3 more replies) 0 siblings, 4 replies; 19+ messages in thread From: Dave Chinner @ 2023-11-10 2:03 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Fri, Nov 10, 2023 at 09:36:51AM +0800, Zorro Lang wrote: > The g/047 still fails with this 2nd patch. So I did below steps [1], > and get the trace output as [2], those dump_inodes() messages you > added have been printed, please check. And that points me at the bug. dump_inodes: disk ino 0x83: init nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 dump_inodes: log ino 0x83: init nblocks 0x8 nextents 0x0/0x1 anextents 0x0/0x0 v3_pad 0x1 nrext64_pad 0x0 di_flags2 0x18 big ^^^^^^^ The initial log inode is correct. dump_inodes: disk ino 0x83: pre nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 dump_inodes: log ino 0x83: pre nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 big ^^^^^^^ .... but on the second sample, it's been modified and the extent count has been zeroed? Huh, that is unexpected - what did that? Oh. Can you test the patch below and see if it fixes the issue? Keep the first verifier patch I sent, then apply the patch below. You can drop the debug traceprintk patch - the patch below should fix it. -Dave. -- Dave Chinner david@fromorbit.com xfs: recovery should not clear di_flushiter unconditionally From: Dave Chinner <dchinner@redhat.com> Because on v3 inodes, di_flushiter doesn't exist. It overlaps with zero padding in the inode, except when NREXT64=1 configurations are in use and the zero padding is no longer padding but holds the 64 bit extent counter. This manifests obviously on big endian platforms (e.g. s390) because the log dinode is in host order and the overlap is the LSBs of the extent count field. It is not noticed on little endian machines because the overlap is at the MSB end of the extent count field and we need to get more than 2^^48 extents in the inode before it manifests. i.e. the heat death of the universe will occur before we see the problem in little endian machines. This is a zero-day issue for NREXT64=1 configuraitons on big endian machines. Fix it by only clearing di_flushiter on v2 inodes during recovery. Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") cc: stable@kernel.org # 5.19+ Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_inode_item_recover.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index f4c31c2b60d5..dbdab4ce7c44 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2( * superblock flag to determine whether we need to look at di_flushiter * to skip replay when the on disk inode is newer than the log one */ - if (!xfs_has_v3inodes(mp) && - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { - /* - * Deal with the wrap case, DI_MAX_FLUSH is less - * than smaller numbers - */ - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { - /* do nothing */ - } else { - trace_xfs_log_recover_inode_skip(log, in_f); - error = 0; - goto out_release; + if (!xfs_has_v3inodes(mp)) { + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { + /* + * Deal with the wrap case, DI_MAX_FLUSH is less + * than smaller numbers + */ + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { + /* do nothing */ + } else { + trace_xfs_log_recover_inode_skip(log, in_f); + error = 0; + goto out_release; + } } + + /* Take the opportunity to reset the flush iteration count */ + ldip->di_flushiter = 0; } - /* Take the opportunity to reset the flush iteration count */ - ldip->di_flushiter = 0; if (unlikely(S_ISREG(ldip->di_mode))) { if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-10 2:03 ` Dave Chinner @ 2023-11-10 4:32 ` Darrick J. Wong 2023-11-10 7:34 ` Christoph Hellwig ` (2 subsequent siblings) 3 siblings, 0 replies; 19+ messages in thread From: Darrick J. Wong @ 2023-11-10 4:32 UTC (permalink / raw) To: Dave Chinner; +Cc: Zorro Lang, linux-xfs, fstests, Carlos Maiolino On Fri, Nov 10, 2023 at 01:03:49PM +1100, Dave Chinner wrote: > On Fri, Nov 10, 2023 at 09:36:51AM +0800, Zorro Lang wrote: > > The g/047 still fails with this 2nd patch. So I did below steps [1], > > and get the trace output as [2], those dump_inodes() messages you > > added have been printed, please check. > > And that points me at the bug. > > dump_inodes: disk ino 0x83: init nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 > dump_inodes: log ino 0x83: init nblocks 0x8 nextents 0x0/0x1 anextents 0x0/0x0 v3_pad 0x1 nrext64_pad 0x0 di_flags2 0x18 big > ^^^^^^^ > The initial log inode is correct. > > dump_inodes: disk ino 0x83: pre nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 > dump_inodes: log ino 0x83: pre nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 big > ^^^^^^^ > > .... but on the second sample, it's been modified and the extent > count has been zeroed? Huh, that is unexpected - what did that? > > Oh. > > Can you test the patch below and see if it fixes the issue? Keep > the first verifier patch I sent, then apply the patch below. You can > drop the debug traceprintk patch - the patch below should fix it. > > -Dave. > -- > Dave Chinner > david@fromorbit.com > > xfs: recovery should not clear di_flushiter unconditionally > > From: Dave Chinner <dchinner@redhat.com> > > Because on v3 inodes, di_flushiter doesn't exist. It overlaps with > zero padding in the inode, except when NREXT64=1 configurations are > in use and the zero padding is no longer padding but holds the 64 > bit extent counter. > > This manifests obviously on big endian platforms (e.g. s390) because > the log dinode is in host order and the overlap is the LSBs of the > extent count field. It is not noticed on little endian machines > because the overlap is at the MSB end of the extent count field and > we need to get more than 2^^48 extents in the inode before it > manifests. i.e. the heat death of the universe will occur before we > see the problem in little endian machines. > > This is a zero-day issue for NREXT64=1 configuraitons on big endian > machines. Fix it by only clearing di_flushiter on v2 inodes during > recovery. > > Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") > cc: stable@kernel.org # 5.19+ > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_inode_item_recover.c | 32 +++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > index f4c31c2b60d5..dbdab4ce7c44 100644 > --- a/fs/xfs/xfs_inode_item_recover.c > +++ b/fs/xfs/xfs_inode_item_recover.c > @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2( > * superblock flag to determine whether we need to look at di_flushiter > * to skip replay when the on disk inode is newer than the log one > */ > - if (!xfs_has_v3inodes(mp) && > - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > - /* > - * Deal with the wrap case, DI_MAX_FLUSH is less > - * than smaller numbers > - */ > - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > - /* do nothing */ > - } else { > - trace_xfs_log_recover_inode_skip(log, in_f); > - error = 0; > - goto out_release; > + if (!xfs_has_v3inodes(mp)) { > + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > + /* > + * Deal with the wrap case, DI_MAX_FLUSH is less > + * than smaller numbers > + */ > + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > + /* do nothing */ > + } else { > + trace_xfs_log_recover_inode_skip(log, in_f); > + error = 0; > + goto out_release; > + } > } > + > + /* Take the opportunity to reset the flush iteration count */ > + ldip->di_flushiter = 0; > } > > - /* Take the opportunity to reset the flush iteration count */ > - ldip->di_flushiter = 0; That's an unfortunate logic bomb left over from the V5 introduction. I guess it was benign until we finally reused that part of the xfs_dinode. If this fixes zorro's machine, then: Reviewed-by: Darrick J. Wong <djwong@kernel.org> I wonder, if we made XFS_SUPPORT_V4=n turn xfs_has_v3inodes and xfs_has_crc into #define'd true symbols, could we then rename all the V4-only fields to see what stops compiling? (Probably not, gcc will still want to parse it all even if the source code itself is dead...) --D > > if (unlikely(S_ISREG(ldip->di_mode))) { > if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-10 2:03 ` Dave Chinner 2023-11-10 4:32 ` Darrick J. Wong @ 2023-11-10 7:34 ` Christoph Hellwig 2023-11-10 13:56 ` Zorro Lang 2023-11-14 11:17 ` edward6 3 siblings, 0 replies; 19+ messages in thread From: Christoph Hellwig @ 2023-11-10 7:34 UTC (permalink / raw) To: Dave Chinner Cc: Zorro Lang, linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino > + if (!xfs_has_v3inodes(mp)) { > + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > + /* > + * Deal with the wrap case, DI_MAX_FLUSH is less > + * than smaller numbers > + */ > + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > + /* do nothing */ > + } else { > + trace_xfs_log_recover_inode_skip(log, in_f); > + error = 0; > + goto out_release; > + } Tis just moves the existing code, but the conditional style is really weird vs a simple: if (be16_to_cpu(dip->di_flushiter) != DI_MAX_FLUSH || ldip->di_flushiter >= (DI_MAX_FLUSH >> 1)) { trace_xfs_log_recover_inode_skip(log, in_f); error = 0; goto out_release; } Nitpicking aside, this looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-10 2:03 ` Dave Chinner 2023-11-10 4:32 ` Darrick J. Wong 2023-11-10 7:34 ` Christoph Hellwig @ 2023-11-10 13:56 ` Zorro Lang 2023-11-14 11:17 ` edward6 3 siblings, 0 replies; 19+ messages in thread From: Zorro Lang @ 2023-11-10 13:56 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Fri, Nov 10, 2023 at 01:03:49PM +1100, Dave Chinner wrote: > On Fri, Nov 10, 2023 at 09:36:51AM +0800, Zorro Lang wrote: > > The g/047 still fails with this 2nd patch. So I did below steps [1], > > and get the trace output as [2], those dump_inodes() messages you > > added have been printed, please check. > > And that points me at the bug. > > dump_inodes: disk ino 0x83: init nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 > dump_inodes: log ino 0x83: init nblocks 0x8 nextents 0x0/0x1 anextents 0x0/0x0 v3_pad 0x1 nrext64_pad 0x0 di_flags2 0x18 big > ^^^^^^^ > The initial log inode is correct. > > dump_inodes: disk ino 0x83: pre nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 > dump_inodes: log ino 0x83: pre nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 big > ^^^^^^^ > > .... but on the second sample, it's been modified and the extent > count has been zeroed? Huh, that is unexpected - what did that? > > Oh. > > Can you test the patch below and see if it fixes the issue? Keep > the first verifier patch I sent, then apply the patch below. You can > drop the debug traceprintk patch - the patch below should fix it. Great, Dave! It works, below testing [1] passed on my s390x now. Maybe I should write Tested-by, but as a reporter, so Reported-by might be proper, so... Reported-by: Zorro Lang <zlang@redhat.com> Anyway, I'm doing a full round fstests "auto" group testing on s390x with this patch, to check if there're more issue on big endian machine. If you have any specific concern (testing condition) hope to test, please tell me, I'll enhance the testing of that part. Thanks, Zorro [1] # ./check generic/047 generic/039 generic/065 FSTYP -- xfs (debug) PLATFORM -- Linux/s390x ibm-z-507 6.6.0+ #1 SMP Wed Nov 8 12:51:20 EST 2023 MKFS_OPTIONS -- -f -m crc=1,finobt=1,rmapbt=0,reflink=1,bigtime=1,inobtcount=1 /dev/loop1 MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/fstests/SCRATCH_DIR generic/039 0s generic/047 10s ... 10s generic/065 1s Ran: generic/039 generic/047 generic/065 Passed all 3 tests > > -Dave. > -- > Dave Chinner > david@fromorbit.com > > xfs: recovery should not clear di_flushiter unconditionally > > From: Dave Chinner <dchinner@redhat.com> > > Because on v3 inodes, di_flushiter doesn't exist. It overlaps with > zero padding in the inode, except when NREXT64=1 configurations are > in use and the zero padding is no longer padding but holds the 64 > bit extent counter. > > This manifests obviously on big endian platforms (e.g. s390) because > the log dinode is in host order and the overlap is the LSBs of the > extent count field. It is not noticed on little endian machines > because the overlap is at the MSB end of the extent count field and > we need to get more than 2^^48 extents in the inode before it > manifests. i.e. the heat death of the universe will occur before we > see the problem in little endian machines. > > This is a zero-day issue for NREXT64=1 configuraitons on big endian > machines. Fix it by only clearing di_flushiter on v2 inodes during > recovery. > > Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") > cc: stable@kernel.org # 5.19+ > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_inode_item_recover.c | 32 +++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > index f4c31c2b60d5..dbdab4ce7c44 100644 > --- a/fs/xfs/xfs_inode_item_recover.c > +++ b/fs/xfs/xfs_inode_item_recover.c > @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2( > * superblock flag to determine whether we need to look at di_flushiter > * to skip replay when the on disk inode is newer than the log one > */ > - if (!xfs_has_v3inodes(mp) && > - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > - /* > - * Deal with the wrap case, DI_MAX_FLUSH is less > - * than smaller numbers > - */ > - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > - /* do nothing */ > - } else { > - trace_xfs_log_recover_inode_skip(log, in_f); > - error = 0; > - goto out_release; > + if (!xfs_has_v3inodes(mp)) { > + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > + /* > + * Deal with the wrap case, DI_MAX_FLUSH is less > + * than smaller numbers > + */ > + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > + /* do nothing */ > + } else { > + trace_xfs_log_recover_inode_skip(log, in_f); > + error = 0; > + goto out_release; > + } > } > + > + /* Take the opportunity to reset the flush iteration count */ > + ldip->di_flushiter = 0; > } > > - /* Take the opportunity to reset the flush iteration count */ > - ldip->di_flushiter = 0; > > if (unlikely(S_ISREG(ldip->di_mode))) { > if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-10 2:03 ` Dave Chinner ` (2 preceding siblings ...) 2023-11-10 13:56 ` Zorro Lang @ 2023-11-14 11:17 ` edward6 3 siblings, 0 replies; 19+ messages in thread From: edward6 @ 2023-11-14 11:17 UTC (permalink / raw) To: david Cc: carlos, djwong, fstests, linux-xfs, zlang, linux-s390, Eduard Shishkin From: Eduard Shishkin <edward6@linux.ibm.com> [...] > Can you test the patch below and see if it fixes the issue? Keep > the first verifier patch I sent, then apply the patch below. You can > drop the debug traceprintk patch - the patch below should fix it. Thanks for fixing it! Tested-by: Eduard Shishkin <edward6@linux.ibm.com> [...] > This manifests obviously on big endian platforms (e.g. s390) because > the log dinode is in host order and the overlap is the LSBs of the > extent count field. It is not noticed on little endian machines > because the overlap is at the MSB end of the extent count field and > we need to get more than 2^^48 extents in the inode before it > manifests. i.e. the heat death of the universe will occur before we > see the problem in little endian machines. This sounds too bold. I can easily imagine mountable images similar to filesystem meta-data dumps, but not generated by scanning real partitions, instead created by a special tool e.g. for debugging purposes. My point is that on little-endian architectures the manifestation of such a "sleeping" bug is a much more realistic event than it may seem at first glance. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] 2023-11-06 20:33 ` Dave Chinner 2023-11-06 22:20 ` Darrick J. Wong 2023-11-07 8:05 ` Zorro Lang @ 2023-11-07 8:29 ` Christoph Hellwig 2 siblings, 0 replies; 19+ messages in thread From: Christoph Hellwig @ 2023-11-07 8:29 UTC (permalink / raw) To: Dave Chinner Cc: Zorro Lang, linux-xfs, fstests, Darrick J. Wong, Carlos Maiolino On Tue, Nov 07, 2023 at 07:33:50AM +1100, Dave Chinner wrote: > That's not good. Can you please determine if this is a zero-day bug > with the nrext64 feature? I think it was merged in 5.19, so if you > could try to reproduce it on a 5.18 and 5.19 kernels first, that > would be handy. > > Also, from your s390 kernel build, can you get the pahole output > for the struct xfs_dinode both for a good kernel and a bad kernel? Another interesting angle is that s390 is the only mainstream big endian platform left. It might be interesting to also try to reproduce this in a ppc64be or sparc64 system if someone has one or a VM). ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-11-14 11:17 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-29 4:11 [Bug report][fstests generic/047] Internal error !(flags & XFS_DABUF_MAP_HOLE_OK) at line 2572 of file fs/xfs/libxfs/xfs_da_btree.c. Caller xfs_dabuf_map.constprop.0+0x26c/0x368 [xfs] Zorro Lang
2023-11-06 6:13 ` Dave Chinner
2023-11-06 19:26 ` Zorro Lang
2023-11-06 20:33 ` Dave Chinner
2023-11-06 22:20 ` Darrick J. Wong
2023-11-07 8:05 ` Zorro Lang
2023-11-07 8:13 ` Dave Chinner
2023-11-07 15:13 ` Zorro Lang
2023-11-08 6:38 ` Dave Chinner
[not found] ` <CAN=2_H+CdEK_rEUmYbmkCjSRqhX2cwi5yRHQcKAmKDPF16vqOw@mail.gmail.com>
2023-11-09 6:14 ` Dave Chinner
2023-11-09 14:09 ` Zorro Lang
2023-11-09 23:13 ` Dave Chinner
2023-11-10 1:36 ` Zorro Lang
2023-11-10 2:03 ` Dave Chinner
2023-11-10 4:32 ` Darrick J. Wong
2023-11-10 7:34 ` Christoph Hellwig
2023-11-10 13:56 ` Zorro Lang
2023-11-14 11:17 ` edward6
2023-11-07 8:29 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox