* [PATCH v5 07/10] fstests: verify IMA isolation on cloned filesystems
From: Anand Jain @ 2026-05-21 12:54 UTC (permalink / raw)
To: fstests
Cc: linux-btrfs, linux-ext4, linux-xfs, linux-f2fs, amir73il, zlang,
hch
In-Reply-To: <cover.1779367627.git.asj@kernel.org>
Add testcase to verify IMA measurement isolation when multiple devices
share the same FSUUID.
Signed-off-by: Anand Jain <asj@kernel.org>
---
tests/generic/804 | 103 ++++++++++++++++++++++++++++++++++++++++++
tests/generic/804.out | 10 ++++
2 files changed, 113 insertions(+)
create mode 100644 tests/generic/804
create mode 100644 tests/generic/804.out
diff --git a/tests/generic/804 b/tests/generic/804
new file mode 100644
index 000000000000..9f3459015422
--- /dev/null
+++ b/tests/generic/804
@@ -0,0 +1,103 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2026 Anand Jain <asj@kernel.org>. All Rights Reserved.
+#
+# FS QA Test 804
+# Verify IMA isolation on cloned filesystems:
+# . Mount two devices sharing the same FSUUID (cloned).
+# . Apply an IMA policy to measure files based on that FSUUID.
+# . Create unique files on each mount point to trigger measurements.
+# . Confirm the IMA log correctly attributes events to the respective mounts.
+
+. ./common/preamble
+. ./common/filter
+
+_begin_fstest auto quick clone
+
+_require_test
+_require_block_device $TEST_DEV
+_require_loop
+
+[ "$FSTYP" = "btrfs" ] && _fixed_by_kernel_commit xxxxxxxxxxxx \
+ "btrfs: use on-disk uuid for s_uuid in temp_fsid mounts"
+[ "$FSTYP" = "btrfs" ] && _fixed_by_kernel_commit xxxxxxxxxxxx \
+ "btrfs: derive f_fsid from on-disk fsuuid and dev_t"
+
+_cleanup()
+{
+ cd /
+ rm -r -f $tmp.*
+ _unmount $mnt1 2>/dev/null
+ _unmount $mnt2 2>/dev/null
+ _loop_image_destroy "${devs[@]}" 2> /dev/null
+}
+
+filter_pool()
+{
+ sed -e "s|${devs[0]}|DEV1|g" -e "s|$mnt1|MNT1|g" \
+ -e "s|${devs[1]}|DEV2|g" -e "s|$mnt2|MNT2|g" | _filter_spaces
+}
+
+do_ima()
+{
+ local ima_policy="/sys/kernel/security/ima/policy"
+ local ima_log="/sys/kernel/security/ima/ascii_runtime_measurements"
+ local fsuuid
+ local mnt=$1
+ local enable=$2
+
+ # Since the in-memory IMA audit log is only cleared upon reboot,
+ # use unique random filenames to avoid log collisions.
+ local foofile=$(mktemp --dry-run foobar_XXXXX)
+
+ echo $mnt $enable | filter_pool
+
+ [ -w "$ima_policy" ] || _notrun "IMA policy not writable"
+
+ fsuuid=$(blkid -s UUID -o value ${devs[0]})
+
+ # Load IMA policy to measure file access specifically for this
+ # filesystem UUID.
+ if [[ $enable -eq 1 ]]; then
+ echo "measure func=FILE_CHECK fsuuid=$fsuuid" > "$ima_policy" || \
+ _notrun "Policy rejected"
+ fi
+
+ # Create a file to trigger measurement and verify its entry in
+ # the IMA log.
+ echo "test_data" > $mnt/$foofile
+
+ # For $ima_log column entry please ref to
+ grep $foofile "$ima_log" | awk '{ print $5 }' | filter_pool | \
+ sed "s/$foofile/FOOBAR_FILE/"
+
+ echo "dbg: $mnt $fsuuid $foofile" >> $seqres.full
+ cat $ima_log | tail -1 >> $seqres.full
+ echo >> $seqres.full
+}
+
+devs=()
+_loop_image_create_clone devs
+mnt1=$TEST_DIR/$seq/mnt1
+mnt2=$TEST_DIR/$seq/mnt2
+mkdir -p $mnt1
+mkdir -p $mnt2
+
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[0]} $mnt1 || \
+ _fail "Failed to mount dev1"
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[1]} $mnt2 || \
+ _fail "Failed to mount dev2"
+
+do_ima $mnt1 1
+do_ima $mnt2 0
+
+# Btrfs uses in-memory dynamic temp_fsid
+echo mount cycle
+_unmount $mnt2
+_mount $mount_opts ${devs[1]} $mnt2 || _fail "Failed to mount dev2"
+
+do_ima $mnt1 0
+do_ima $mnt2 0
+
+status=0
+exit
diff --git a/tests/generic/804.out b/tests/generic/804.out
new file mode 100644
index 000000000000..9804181d6c17
--- /dev/null
+++ b/tests/generic/804.out
@@ -0,0 +1,10 @@
+QA output created by 804
+MNT1 1
+MNT1/FOOBAR_FILE
+MNT2 0
+MNT2/FOOBAR_FILE
+mount cycle
+MNT1 0
+MNT1/FOOBAR_FILE
+MNT2 0
+MNT2/FOOBAR_FILE
--
2.43.0
^ permalink raw reply related
* [PATCH v5 06/10] fstests: verify libblkid resolution of duplicate UUIDs
From: Anand Jain @ 2026-05-21 12:54 UTC (permalink / raw)
To: fstests
Cc: linux-btrfs, linux-ext4, linux-xfs, linux-f2fs, amir73il, zlang,
hch
In-Reply-To: <cover.1779367627.git.asj@kernel.org>
Verify how findmnt, df (libblkid) resolve device paths when multiple
block devices share the same FSUUID.
Signed-off-by: Anand Jain <asj@kernel.org>
---
tests/generic/803 | 76 +++++++++++++++++++++++++++++++++++++++++++
tests/generic/803.out | 19 +++++++++++
2 files changed, 95 insertions(+)
create mode 100644 tests/generic/803
create mode 100644 tests/generic/803.out
diff --git a/tests/generic/803 b/tests/generic/803
new file mode 100644
index 000000000000..36de7887065e
--- /dev/null
+++ b/tests/generic/803
@@ -0,0 +1,76 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2026 Anand Jain <asj@kernel.org>. All Rights Reserved.
+#
+# FS QA Test 803
+# Verify how libblkid resolve devices when multiple devices sharing the
+# same FSUUID.
+
+. ./common/preamble
+. ./common/filter
+
+_begin_fstest auto quick mount clone
+
+_require_test
+_require_block_device $TEST_DEV
+_require_loop
+
+_cleanup()
+{
+ cd /
+ rm -r -f $tmp.*
+ umount $mnt1 $mnt2 2>/dev/null
+ _loop_image_destroy "${devs[@]}" 2> /dev/null
+}
+
+filter_pool()
+{
+ sed -e "s|${devs[0]}|DEV1|g" -e "s|${mnt1}|MNT1|g" \
+ -e "s|${devs[1]}|DEV2|g" -e "s|${mnt2}|MNT2|g" | _filter_spaces
+}
+
+print_info()
+{
+ local mntpt=$1
+ local tgt=$(findmnt -no SOURCE $mntpt)
+ local fsuuid=$(blkid -s UUID -o value $tgt)
+
+ echo "mntpt=$mntpt tgt=$tgt fsuuid=$fsuuid" >> $seqres.full
+ echo
+ findmnt -o SOURCE,TARGET,UUID "$tgt" | tail -n +2 | \
+ sed -e "s/${fsuuid}/FSUUID/g" | filter_pool
+ awk -v dev="$tgt" '$1 == dev { print $1, $2 }' /proc/self/mounts | \
+ filter_pool
+ df --all --output=source,target "$tgt" | tail -n +2 | filter_pool
+}
+
+devs=()
+_loop_image_create_clone devs
+mkdir -p $TEST_DIR/$seq
+mnt1=$TEST_DIR/$seq/mnt1
+mnt2=$TEST_DIR/$seq/mnt2
+mkdir -p $mnt1
+mkdir -p $mnt2
+
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[0]} $mnt1 || \
+ _fail "Failed to mount dev1"
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[1]} $mnt2 || \
+ _fail "Failed to mount dev2"
+
+print_info $mnt1
+print_info $mnt2
+
+echo
+echo "**** mount cycle ****"
+_unmount $mnt1
+_unmount $mnt2
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[1]} $mnt2 || \
+ _fail "Failed to mount dev2"
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[0]} $mnt1 || \
+ _fail "Failed to mount dev1"
+
+print_info $mnt1
+print_info $mnt2
+
+status=0
+exit
diff --git a/tests/generic/803.out b/tests/generic/803.out
new file mode 100644
index 000000000000..20a1cb36a213
--- /dev/null
+++ b/tests/generic/803.out
@@ -0,0 +1,19 @@
+QA output created by 803
+
+DEV1 MNT1 FSUUID
+DEV1 MNT1
+DEV1 MNT1
+
+DEV2 MNT2 FSUUID
+DEV2 MNT2
+DEV2 MNT2
+
+**** mount cycle ****
+
+DEV1 MNT1 FSUUID
+DEV1 MNT1
+DEV1 MNT1
+
+DEV2 MNT2 FSUUID
+DEV2 MNT2
+DEV2 MNT2
--
2.43.0
^ permalink raw reply related
* [PATCH v5 05/10] fstests: verify f_fsid for cloned filesystems
From: Anand Jain @ 2026-05-21 12:54 UTC (permalink / raw)
To: fstests
Cc: linux-btrfs, linux-ext4, linux-xfs, linux-f2fs, amir73il, zlang,
hch
In-Reply-To: <cover.1779367627.git.asj@kernel.org>
Verify that the cloned filesystem provides an f_fsid that is persistent
across mount cycles, yet unique from the original filesystem's f_fsid.
Signed-off-by: Anand Jain <asj@kernel.org>
---
tests/generic/802 | 62 +++++++++++++++++++++++++++++++++++++++++++
tests/generic/802.out | 7 +++++
2 files changed, 69 insertions(+)
create mode 100644 tests/generic/802
create mode 100644 tests/generic/802.out
diff --git a/tests/generic/802 b/tests/generic/802
new file mode 100644
index 000000000000..31044695f3a8
--- /dev/null
+++ b/tests/generic/802
@@ -0,0 +1,62 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2026 Anand Jain <asj@kernel.org>. All Rights Reserved.
+#
+# FS QA Test 802
+# Verify f_fsid and s_uuid of cloned filesystems across mount cycle.
+
+. ./common/preamble
+
+_begin_fstest auto quick mount clone
+
+_require_test
+_require_block_device $TEST_DEV
+_require_loop
+
+[ "$FSTYP" = "btrfs" ] && _fixed_by_kernel_commit xxxxxxxxxxxx \
+ "btrfs: use on-disk uuid for s_uuid in temp_fsid mounts"
+[ "$FSTYP" = "btrfs" ] && _fixed_by_kernel_commit xxxxxxxxxxxx \
+ "btrfs: derive f_fsid from on-disk fsuuid and dev_t"
+
+_cleanup()
+{
+ cd /
+ rm -r -f $tmp.*
+ umount $mnt1 $mnt2 2>/dev/null
+ _loop_image_destroy "${devs[@]}" 2> /dev/null
+}
+
+devs=()
+_loop_image_create_clone devs
+mkdir -p $TEST_DIR/$seq
+mnt1=$TEST_DIR/$seq/mnt1
+mnt2=$TEST_DIR/$seq/mnt2
+mkdir -p $mnt1
+mkdir -p $mnt2
+
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[0]} $mnt1 || \
+ _fail "Failed to mount dev1"
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[1]} $mnt2 || \
+ _fail "Failed to mount dev2"
+
+fsid_scratch=$(stat -f -c "%i" $mnt1)
+fsid_clone=$(stat -f -c "%i" $mnt2)
+
+echo "**** fsid initially ****"
+echo $fsid_scratch | sed -e "s/$fsid_scratch/FSID_SCRATCH/g"
+echo $fsid_clone | sed -e "s/$fsid_clone/FSID_CLONE/g"
+
+# Make sure fsid still match across a mount cycle, also reverse the order.
+echo "**** fsid after mount cycle ****"
+_unmount $mnt1
+_unmount $mnt2
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[1]} $mnt2 || \
+ _fail "Failed to mount dev2"
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[0]} $mnt1 || \
+ _fail "Failed to mount dev1"
+
+stat -f -c "%i" $mnt1 | sed -e "s/$fsid_scratch/FSID_SCRATCH/g"
+stat -f -c "%i" $mnt2 | sed -e "s/$fsid_clone/FSID_CLONE/g"
+
+status=0
+exit
diff --git a/tests/generic/802.out b/tests/generic/802.out
new file mode 100644
index 000000000000..d1e008f122bb
--- /dev/null
+++ b/tests/generic/802.out
@@ -0,0 +1,7 @@
+QA output created by 802
+**** fsid initially ****
+FSID_SCRATCH
+FSID_CLONE
+**** fsid after mount cycle ****
+FSID_SCRATCH
+FSID_CLONE
--
2.43.0
^ permalink raw reply related
* [PATCH v5 04/10] fstests: verify fanotify isolation on cloned filesystems
From: Anand Jain @ 2026-05-21 12:54 UTC (permalink / raw)
To: fstests
Cc: linux-btrfs, linux-ext4, linux-xfs, linux-f2fs, amir73il, zlang,
hch
In-Reply-To: <cover.1779367627.git.asj@kernel.org>
Verify that fanotify events are correctly routed to the appropriate
watcher when cloned filesystems are mounted.
Helps verify kernel's event notification distinguishes between devices
sharing the same FSID/UUID.
Signed-off-by: Anand Jain <asj@kernel.org>
---
common/config | 1 +
tests/generic/801 | 125 ++++++++++++++++++++++++++++++++++++++++++
tests/generic/801.out | 7 +++
3 files changed, 133 insertions(+)
create mode 100644 tests/generic/801
create mode 100644 tests/generic/801.out
diff --git a/common/config b/common/config
index 605a57947a40..1588bdcb1aa1 100644
--- a/common/config
+++ b/common/config
@@ -243,6 +243,7 @@ export PARTED_PROG="$(type -P parted)"
export XFS_PROPERTY_PROG="$(type -P xfs_property)"
export FSCRYPTCTL_PROG="$(type -P fscryptctl)"
export INOTIFYWAIT_PROG="$(type -P inotifywait)"
+export FSNOTIFYWAIT_PROG="$(type -P fsnotifywait)"
# udev wait functions.
#
diff --git a/tests/generic/801 b/tests/generic/801
new file mode 100644
index 000000000000..5cbdfb85539b
--- /dev/null
+++ b/tests/generic/801
@@ -0,0 +1,125 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2026 Anand Jain <asj@kernel.org>. All Rights Reserved.
+#
+# FS QA Test 801
+# Verify fanotify FID functionality on cloned filesystems by setting up
+# watchers and making sure notifications are in the correct logs files.
+
+. ./common/preamble
+
+_begin_fstest auto quick mount clone
+
+_require_test
+_require_block_device $TEST_DEV
+_require_loop
+_require_command "$FSNOTIFYWAIT_PROG" fsnotifywait
+
+_cleanup()
+{
+ cd /
+ [[ -n $pid1 ]] && { kill -TERM "$pid1" 2> /dev/null; wait $pid1; }
+ [[ -n $pid2 ]] && { kill -TERM "$pid2" 2> /dev/null; wait $pid2; }
+
+ if [ "$semanage_added" = "yes" ]; then
+ semanage permissive -d unconfined_t >/dev/null 2>&1 || true
+ fi
+
+ umount $mnt1 $mnt2 2>/dev/null
+ _loop_image_destroy "${devs[@]}" 2> /dev/null
+ rm -r -f $tmp.*
+}
+
+monitor_fanotify()
+{
+ local mmnt=$1
+ exec stdbuf -oL $FSNOTIFYWAIT_PROG -m -F -S -e create "$mmnt" 2>&1
+}
+
+fsid_to_fid_parts()
+{
+ local fsid=$1
+ # Pad to 16 hex chars (64-bit), then split into two 32-bit halves
+ local padded=$(printf '%016x' "0x${fsid}")
+ local hi=$(printf '%x' "0x${padded:0:8}") # strips leading zeros
+ local lo=$(printf '%x' "0x${padded:8:8}") # strips leading zeros
+ echo "${hi}.${lo}"
+}
+
+devs=()
+_loop_image_create_clone devs
+mkdir -p $TEST_DIR/$seq
+mnt1=$TEST_DIR/$seq/mnt1
+mnt2=$TEST_DIR/$seq/mnt2
+mkdir -p $mnt1
+mkdir -p $mnt2
+
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[0]} $mnt1 || \
+ _fail "Failed to mount dev1"
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[1]} $mnt2 || \
+ _fail "Failed to mount dev2"
+
+fsid1=$(stat -f -c "%i" $mnt1)
+fsid2=$(stat -f -c "%i" $mnt2)
+
+[[ "$fsid1" == "$fsid2" ]] && \
+ _notrun "Require clone filesystem with unique f_fsid"
+
+log1=$tmp.fanotify1
+log2=$tmp.fanotify2
+
+pid1=""
+pid2=""
+echo "Setup FID fanotify watchers on both mnt1 and mnt2"
+semanage_added="no"
+if [ "$(getenforce 2>/dev/null)" = "Enforcing" ]; then
+ if ! semanage permissive -l | grep -q "unconfined_t"; then
+ semanage permissive -a unconfined_t >/dev/null 2>&1 && semanage_added="yes"
+ fi
+fi
+
+( monitor_fanotify "$mnt1" > "$log1" ) &
+pid1=$!
+( monitor_fanotify "$mnt2" > "$log2" ) &
+pid2=$!
+sleep 2
+
+echo "Trigger file creation on mnt1"
+touch $mnt1/file_on_mnt1
+sync
+sleep 1
+
+echo "Trigger file creation on mnt2"
+touch $mnt2/file_on_mnt2
+sync
+sleep 1
+
+echo "Verify fsid in the fanotify"
+kill $pid1 $pid2
+wait $pid1 $pid2 2>/dev/null
+pid1=""
+pid2=""
+
+e_fsid1=$(fsid_to_fid_parts "$fsid1")
+e_fsid2=$(fsid_to_fid_parts "$fsid2")
+
+echo $fsid1 $e_fsid1 $fsid2 $e_fsid2 >> $seqres.full
+cat $log1 >> $seqres.full
+cat $log2 >> $seqres.full
+
+if grep -qF "$e_fsid1" "$log1" && ! grep -qF "$e_fsid2" "$log1"; then
+ echo "SUCCESS: mnt1 events found"
+else
+ [ ! -s "$log1" ] && echo " - mnt1 received no events."
+ grep -qF "$e_fsid2" "$log1" && echo " - mnt1 received event from mnt2."
+fi
+
+if grep -qF "$e_fsid2" "$log2" && ! grep -qF "$e_fsid1" "$log2"; then
+ echo "SUCCESS: mnt2 events found"
+else
+ [ ! -s "$log2" ] && echo " - mnt2 received no events."
+ grep -qF "$e_fsid1" "$log2" && echo " - mnt2 received event from mnt1."
+fi
+
+status=0
+exit
diff --git a/tests/generic/801.out b/tests/generic/801.out
new file mode 100644
index 000000000000..d7b318d9f27c
--- /dev/null
+++ b/tests/generic/801.out
@@ -0,0 +1,7 @@
+QA output created by 801
+Setup FID fanotify watchers on both mnt1 and mnt2
+Trigger file creation on mnt1
+Trigger file creation on mnt2
+Verify fsid in the fanotify
+SUCCESS: mnt1 events found
+SUCCESS: mnt2 events found
--
2.43.0
^ permalink raw reply related
* [PATCH v5 03/10] fstests: add test for inotify isolation on cloned devices
From: Anand Jain @ 2026-05-21 12:54 UTC (permalink / raw)
To: fstests
Cc: linux-btrfs, linux-ext4, linux-xfs, linux-f2fs, amir73il, zlang,
hch
In-Reply-To: <cover.1779367627.git.asj@kernel.org>
Add a new test, to verify that the kernel correctly differentiates between
two block devices sharing the same FSID/UUID.
Signed-off-by: Anand Jain <asj@kernel.org>
---
common/config | 1 +
tests/generic/800 | 89 +++++++++++++++++++++++++++++++++++++++++++
tests/generic/800.out | 7 ++++
3 files changed, 97 insertions(+)
create mode 100644 tests/generic/800
create mode 100644 tests/generic/800.out
diff --git a/common/config b/common/config
index 4fd4c2c8af11..605a57947a40 100644
--- a/common/config
+++ b/common/config
@@ -242,6 +242,7 @@ export BTRFS_MAP_LOGICAL_PROG=$(type -P btrfs-map-logical)
export PARTED_PROG="$(type -P parted)"
export XFS_PROPERTY_PROG="$(type -P xfs_property)"
export FSCRYPTCTL_PROG="$(type -P fscryptctl)"
+export INOTIFYWAIT_PROG="$(type -P inotifywait)"
# udev wait functions.
#
diff --git a/tests/generic/800 b/tests/generic/800
new file mode 100644
index 000000000000..4b9bd3e4f487
--- /dev/null
+++ b/tests/generic/800
@@ -0,0 +1,89 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2026 Anand Jain <asj@kernel.org>. All Rights Reserved.
+#
+# FS QA Test 800
+#
+# Verify if the kernel or userspace becomes confused when two block devices
+# share the same fid/fsid/uuid. Create inotify on both original and cloned
+# filesystem. Monitor the notification in the respective logs.
+
+. ./common/preamble
+
+_begin_fstest auto quick mount clone
+
+_require_test
+_require_block_device $TEST_DEV
+_require_loop
+_require_command "$INOTIFYWAIT_PROG" inotifywait
+
+_cleanup()
+{
+ cd /
+ [[ -n $pid1 ]] && { kill -TERM "$pid1" 2> /dev/null; wait $pid1; }
+ [[ -n $pid2 ]] && { kill -TERM "$pid2" 2> /dev/null; wait $pid2; }
+ rm -r -f $tmp.*
+ _unmount $mnt1 2>/dev/null
+ _unmount $mnt2 2>/dev/null
+ _loop_image_destroy "${devs[@]}" 2> /dev/null
+}
+
+devs=()
+_loop_image_create_clone devs
+mkdir -p $TEST_DIR/$seq
+mnt1=$TEST_DIR/$seq/mnt1
+mnt2=$TEST_DIR/$seq/mnt2
+mkdir -p $mnt1
+mkdir -p $mnt2
+
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[0]} $mnt1 || \
+ _fail "Failed to mount dev1"
+_mount $(_common_dev_mount_options) $(_clone_mount_option) ${devs[1]} $mnt2 || \
+ _fail "Failed to mount dev2"
+
+log1=$tmp.inotify1
+log2=$tmp.inotify2
+
+pid1=""
+pid2=""
+echo "Setup inotify watchers on both mnt1 and mnt2"
+$INOTIFYWAIT_PROG -m -e create --format '%f' $mnt1 > $log1 2>&1 &
+pid1=$!
+$INOTIFYWAIT_PROG -m -e create --format '%f' $mnt2 > $log2 2>&1 &
+pid2=$!
+sleep 2
+
+echo "Trigger file creation on mnt1"
+touch $mnt1/file_on_mnt1
+sync
+sleep 1
+
+echo "Trigger file creation on mnt2"
+touch $mnt2/file_on_mnt2
+sync
+sleep 1
+
+echo "Verify inotify isolation"
+kill $pid1 $pid2
+wait $pid1 $pid2 2>/dev/null
+pid1=""
+pid2=""
+
+if grep -q "file_on_mnt1" $log1 && ! grep -q "file_on_mnt2" $log1; then
+ echo "SUCCESS: mnt1 events isolated."
+else
+ echo "FAIL: mnt1 inotify confusion!"
+ [ ! -s $log1 ] && echo " - mnt1 received no events."
+ grep -q "file_on_mnt2" $log1 && echo " - mnt1 received event from mnt2."
+fi
+
+if grep -q "file_on_mnt2" $log2 && ! grep -q "file_on_mnt1" $log2; then
+ echo "SUCCESS: mnt2 events isolated."
+else
+ echo "FAIL: mnt2 inotify confusion!"
+ [ ! -s $log2 ] && echo " - mnt2 received no events."
+ grep -q "file_on_mnt1" $log2 && echo " - mnt2 received event from mnt1."
+fi
+
+status=0
+exit
diff --git a/tests/generic/800.out b/tests/generic/800.out
new file mode 100644
index 000000000000..b10842a31210
--- /dev/null
+++ b/tests/generic/800.out
@@ -0,0 +1,7 @@
+QA output created by 800
+Setup inotify watchers on both mnt1 and mnt2
+Trigger file creation on mnt1
+Trigger file creation on mnt2
+Verify inotify isolation
+SUCCESS: mnt1 events isolated.
+SUCCESS: mnt2 events isolated.
--
2.43.0
^ permalink raw reply related
* [PATCH v5 02/10] fstests: add _clone_mount_option() helper
From: Anand Jain @ 2026-05-21 12:54 UTC (permalink / raw)
To: fstests
Cc: linux-btrfs, linux-ext4, linux-xfs, linux-f2fs, amir73il, zlang,
hch
In-Reply-To: <cover.1779367627.git.asj@kernel.org>
Adds _clone_mount_option() helper function to handle filesystem-specific
requirements for mounting cloned devices. Abstract the need for -o nouuid
on XFS.
Signed-off-by: Anand Jain <asj@kernel.org>
---
common/rc | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/common/rc b/common/rc
index a2ee23c45003..7ae9877918c8 100644
--- a/common/rc
+++ b/common/rc
@@ -397,6 +397,20 @@ _scratch_mount_options()
$SCRATCH_DEV $SCRATCH_MNT
}
+_clone_mount_option()
+{
+ local mount_opts=""
+
+ case "$FSTYP" in
+ xfs)
+ mount_opts="-o nouuid"
+ ;;
+ *)
+ esac
+
+ echo $mount_opts
+}
+
_supports_filetype()
{
local dir=$1
--
2.43.0
^ permalink raw reply related
* [PATCH v5 01/10] fstests: add _loop_image_create_clone() helper
From: Anand Jain @ 2026-05-21 12:54 UTC (permalink / raw)
To: fstests
Cc: linux-btrfs, linux-ext4, linux-xfs, linux-f2fs, amir73il, zlang,
hch
In-Reply-To: <cover.1779367627.git.asj@kernel.org>
Introduce _loop_image_create_clone() and _loop_image_destroy() to mkfs an
image file and clone it to another image file, and attach a loop device to
them. And its destroy part.
Signed-off-by: Anand Jain <asj@kernel.org>
---
common/rc | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
diff --git a/common/rc b/common/rc
index 9632b211b58f..a2ee23c45003 100644
--- a/common/rc
+++ b/common/rc
@@ -1503,6 +1503,56 @@ _scratch_resvblks()
esac
}
+_loop_image_create_clone()
+{
+ local -n _ret=$1
+ local pre_clone_tune_func="$2"
+ local img_file=$TEST_DIR/${seq}.img
+ local img_file_clone=$TEST_DIR/${seq}_clone.img
+ local size=$(_small_fs_size_mb 128) # Smallest possible
+ local loop_devs
+
+ _require_fs_space $TEST_DIR $((size * 1024))
+
+ _create_file_sized $((size * 1024 * 1024)) $img_file ||
+ _fail "Failed: Create $img_file $size"
+
+ loop_devs=$(_create_loop_device $img_file)
+ _ret=($loop_devs)
+
+ case $FSTYP in
+ xfs)
+ _mkfs_dev "-s size=4096" ${loop_devs[0]}
+ ;;
+ btrfs)
+ _mkfs_dev ${loop_devs[0]}
+ ;;
+ *)
+ _mkfs_dev ${loop_devs[0]}
+ ;;
+ esac
+
+ # Only execute if the function argument is not empty
+ if [ -n "$pre_clone_tune_func" ]; then
+ $pre_clone_tune_func ${loop_devs[0]}
+ fi
+
+ sync ${loop_devs[0]}
+ cp $img_file $img_file_clone
+
+ loop_devs="$loop_devs $(_create_loop_device $img_file_clone)"
+
+ _ret=($loop_devs)
+}
+
+_loop_image_destroy()
+{
+ for d in "$@"; do
+ local f=$(losetup --noheadings --output BACK-FILE $d)
+ _destroy_loop_device "$d"
+ [ -n "$f" ] && rm -f "$f"
+ done
+}
# Repair scratch filesystem. Returns 0 if the FS is good to go (either no
# errors found or errors were fixed) and nonzero otherwise; also spits out
--
2.43.0
^ permalink raw reply related
* [PATCH v5 0/10] fstests: add test coverage for cloned filesystem ids
From: Anand Jain @ 2026-05-21 12:54 UTC (permalink / raw)
To: fstests
Cc: linux-btrfs, linux-ext4, linux-xfs, linux-f2fs, amir73il, zlang,
hch
v5:
XFS supports metadata_uuid, I converted patch 10/10 to a generic
test. To do this, I moved the in-test helper to common/rc as
`pre_clone_tune_uuid()`.
I also verified fanotifywait with SELinux in enforcing mode and
fixed patch 4/10.
v4:
https://lore.kernel.org/fstests/cover.1777357320.git.asj@kernel.org
v3:
https://lore.kernel.org/fstests/cover.1777281778.git.asj@kernel.org
v2:
https://lore.kernel.org/fstests/cover.1774090817.git.asj@kernel.org
v1:
https://lore.kernel.org/fstests/cover.1772095513.git.asj@kernel.org
This series adds fstests infrastructure and test cases to verify correct
filesystem identity when a filesystem is cloned (block-level copy).
Test covers inotify, fanotify, f_fsid, libblkid, IMA, exportfs file handles
and libblkid tools verify with metadata_uuid.
New helpers:
- _mkfs_scratch_sized_clone() helper to create a cloned filesystem
- _clone_mount_option() helper to apply per-filesystem clone mount options
- pre_clone_tune_uuid() changes the UUID before the clone
New tests:
- inotify and fanotify events are isolated between cloned filesystems
- f_fsid is unique across cloned filesystem instances
- libblkid correctly resolves duplicate UUIDs to distinct devices
with and without metadata_uuid
- IMA distinct identity for each cloned filesystem
- exportfs file handles resolve correctly on cloned filesystems
Kernel Patches:
Requires Btrfs kernel patches for all tests to pass.
[1] https://lore.kernel.org/linux-btrfs/cover.1777281686.git.asj@kernel.org
Anand Jain (10):
fstests: add _loop_image_create_clone() helper
fstests: add _clone_mount_option() helper
fstests: add test for inotify isolation on cloned devices
fstests: verify fanotify isolation on cloned filesystems
fstests: verify f_fsid for cloned filesystems
fstests: verify libblkid resolution of duplicate UUIDs
fstests: verify IMA isolation on cloned filesystems
fstests: verify exportfs file handles on cloned filesystems
fstests: add pre_clone_tune_uuid() healper
fstests: test UUID consistency for clones with metadata_uuid
common/config | 2 +
common/rc | 84 ++++++++++++++++++++++++++++
tests/generic/800 | 89 ++++++++++++++++++++++++++++++
tests/generic/800.out | 7 +++
tests/generic/801 | 125 ++++++++++++++++++++++++++++++++++++++++++
tests/generic/801.out | 7 +++
tests/generic/802 | 62 +++++++++++++++++++++
tests/generic/802.out | 7 +++
tests/generic/803 | 76 +++++++++++++++++++++++++
tests/generic/803.out | 19 +++++++
tests/generic/804 | 103 ++++++++++++++++++++++++++++++++++
tests/generic/804.out | 10 ++++
tests/generic/805 | 73 ++++++++++++++++++++++++
tests/generic/805.out | 2 +
tests/generic/806 | 78 ++++++++++++++++++++++++++
tests/generic/806.out | 19 +++++++
16 files changed, 763 insertions(+)
create mode 100644 tests/generic/800
create mode 100644 tests/generic/800.out
create mode 100644 tests/generic/801
create mode 100644 tests/generic/801.out
create mode 100644 tests/generic/802
create mode 100644 tests/generic/802.out
create mode 100644 tests/generic/803
create mode 100644 tests/generic/803.out
create mode 100644 tests/generic/804
create mode 100644 tests/generic/804.out
create mode 100644 tests/generic/805
create mode 100644 tests/generic/805.out
create mode 100644 tests/generic/806
create mode 100644 tests/generic/806.out
--
2.43.0
^ permalink raw reply
* Re: [PATCH v10 00/22] fs-verity support for XFS with post EOF merkle tree
From: Carlos Maiolino @ 2026-05-21 11:42 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Christoph Hellwig, Andrey Albershteyn, linux-xfs, fsverity,
linux-fsdevel, ebiggers, linux-ext4, linux-f2fs-devel,
linux-btrfs, linux-unionfs, djwong, david, brauner, amir73il,
miklos
In-Reply-To: <ag7uW7KQ5mVyCGEv@nidhogg.toxiclabs.cc>
On Thu, May 21, 2026 at 01:38:45PM +0200, Carlos Maiolino wrote:
> On Thu, May 21, 2026 at 11:42:13AM +0200, Andrey Albershteyn wrote:
> > On 2026-05-21 11:07:05, Christoph Hellwig wrote:
> > > On Wed, May 20, 2026 at 02:36:58PM +0200, Andrey Albershteyn wrote:
> > > > This series based on v7.1-rc4.
> > >
> > > How are we going to merge this? It touches at three subsystem trees
> > > (fsverity, vfs/iomap, xfs) so some coordination will be needed.
> >
> > As most of the patches are xfs, it's probably make sense to go
> > through xfs tree
> >
> > Carlos, what do you think?
>
> I was expecting this to come through xfs tree too if Eric and Christian
> agree.
> FWIW I'm adding Christian to the Cc
Woops... Also Adding Amir and Miklos to the Cc list due the overlayfs
patch:
>
> >
> > --
> > - Andrey
> >
> >
^ permalink raw reply
* Re: [PATCH v10 00/22] fs-verity support for XFS with post EOF merkle tree
From: Carlos Maiolino @ 2026-05-21 11:38 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Christoph Hellwig, Andrey Albershteyn, linux-xfs, fsverity,
linux-fsdevel, ebiggers, linux-ext4, linux-f2fs-devel,
linux-btrfs, linux-unionfs, djwong, david, brauner
In-Reply-To: <gcw7zcg6p4s2egpufwizfig72g6ren7jfmuz5mqskkpb7xepww@e3v35j7bk7dc>
On Thu, May 21, 2026 at 11:42:13AM +0200, Andrey Albershteyn wrote:
> On 2026-05-21 11:07:05, Christoph Hellwig wrote:
> > On Wed, May 20, 2026 at 02:36:58PM +0200, Andrey Albershteyn wrote:
> > > This series based on v7.1-rc4.
> >
> > How are we going to merge this? It touches at three subsystem trees
> > (fsverity, vfs/iomap, xfs) so some coordination will be needed.
>
> As most of the patches are xfs, it's probably make sense to go
> through xfs tree
>
> Carlos, what do you think?
I was expecting this to come through xfs tree too if Eric and Christian
agree.
FWIW I'm adding Christian to the Cc
>
> --
> - Andrey
>
>
^ permalink raw reply
* Re: [PATCH v3] iomap: add simple read path for small direct I/O
From: Fengnan @ 2026-05-21 11:26 UTC (permalink / raw)
To: brauner, djwong, hch, ojaswin, dgc, linux-xfs, linux-fsdevel,
linux-ext4, linux-kernel, lidiangang
In-Reply-To: <20260515084028.98160-1-changfengnan@bytedance.com>
在 2026/5/15 16:40, Fengnan Chang 写道:
> When running 4K random read workloads on high-performance Gen5 NVMe
> SSDs, the software overhead in the iomap direct I/O path
> (__iomap_dio_rw) becomes a significant bottleneck.
>
> Using io_uring with poll mode for a 4K randread test on a raw block
> device:
> taskset -c 30 ./t/io_uring -p1 -d512 -b4096 -s32 -c32 -F1 -B1 -R1 -X1
> -n1 -P1 /dev/nvme10n1
> Result: ~3.2M IOPS
>
> Running the exact same workload on ext4 and XFS:
> taskset -c 30 ./t/io_uring -p1 -d512 -b4096 -s32 -c32 -F1 -B1 -R1 -X1
> -n1 -P1 /mnt/testfile
> Result: ~1.92M IOPS
>
> Profiling the ext4 workload reveals that a significant portion of CPU
> time is spent on memory allocation and the iomap state machine
> iteration:
> 5.33% [kernel] [k] __iomap_dio_rw
> 3.26% [kernel] [k] iomap_iter
> 2.37% [kernel] [k] iomap_dio_bio_iter
> 2.35% [kernel] [k] kfree
> 1.33% [kernel] [k] iomap_dio_complete
>
> Introduce simple reads to reduce the overhead of iomap, simple read path
> is triggered when the request satisfies:
> - I/O size is <= inode blocksize (fits in a single block, no splits).
> - No custom `iomap_dio_ops` (dops) registered by the filesystem.
>
> After this optimization, the heavy generic functions disappear from the
> profile, replaced by a single streamlined execution path:
> 4.83% [kernel] [k] iomap_dio_simple_read
>
> With this patch, 4K random read IOPS on ext4 increases from 1.92M to
> 2.19M in the original single-core io_uring poll-mode workload.
>
> Below are the test results using fio:
>
> fs workload qd simple=0 simple=1 gain
> ext4 libaio 1 18,295 18,314 +0.10%
> ext4 libaio 64 458,374 473,557 +3.31%
> ext4 libaio 256 456,944 471,865 +3.27%
> ext4 libaio 1024 459,058 476,433 +3.78%
> ext4 io_uring 1 18,882 18,897 +0.08%
> ext4 io_uring 64 552,607 576,712 +4.36%
> ext4 io_uring 256 552,330 576,603 +4.39%
> ext4 io_uring 1024 557,330 584,227 +4.83%
> ext4 io_uring_poll 1 19,387 19,407 +0.10%
> ext4 io_uring_poll 64 794,073 843,926 +6.28%
> ext4 io_uring_poll 256 794,081 852,765 +7.39%
> ext4 io_uring_poll 1024 679,857 739,773 +8.81%
> xfs libaio 1 18,314 18,344 +0.16%
> xfs libaio 64 459,008 477,249 +3.97%
> xfs libaio 256 455,308 475,960 +4.54%
> xfs libaio 1024 461,499 477,098 +3.38%
> xfs io_uring 1 18,867 18,895 +0.15%
> xfs io_uring 64 556,197 582,650 +4.76%
> xfs io_uring 256 556,946 582,802 +4.64%
> xfs io_uring 1024 562,361 591,056 +5.10%
> xfs io_uring_poll 1 19,380 19,406 +0.14%
> xfs io_uring_poll 64 797,734 849,029 +6.43%
> xfs io_uring_poll 256 796,156 852,550 +7.08%
> xfs io_uring_poll 1024 677,254 735,667 +8.62%
ping..
By the way, the reason why io_uring_poll performs worse than the previous
version is that there was an issue with the test script: sqthread_poll
was not
enabled, my bad.
> v3:
> Test data updated based on v7.1-rc3.
>
> Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> ---
> fs/iomap/direct-io.c | 384 +++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 373 insertions(+), 11 deletions(-)
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index b0a6549b38487..8b86ad7df4bbd 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -10,6 +10,9 @@
> #include <linux/iomap.h>
> #include <linux/task_io_accounting_ops.h>
> #include <linux/fserror.h>
> +#include <linux/kobject.h>
> +#include <linux/sysfs.h>
> +#include <linux/init.h>
> #include "internal.h"
> #include "trace.h"
>
> @@ -237,23 +240,29 @@ static void iomap_dio_done(struct iomap_dio *dio)
> iomap_dio_complete_work(&dio->aio.work);
> }
>
> -static void __iomap_dio_bio_end_io(struct bio *bio, bool inline_completion)
> +static inline void iomap_dio_bio_release_pages(struct bio *bio,
> + unsigned int dio_flags, bool error)
> {
> - struct iomap_dio *dio = bio->bi_private;
> -
> if (bio_integrity(bio))
> fs_bio_integrity_free(bio);
>
> - if (dio->flags & IOMAP_DIO_BOUNCE) {
> - bio_iov_iter_unbounce(bio, !!dio->error,
> - dio->flags & IOMAP_DIO_USER_BACKED);
> + if (dio_flags & IOMAP_DIO_BOUNCE) {
> + bio_iov_iter_unbounce(bio, error,
> + dio_flags & IOMAP_DIO_USER_BACKED);
> bio_put(bio);
> - } else if (dio->flags & IOMAP_DIO_USER_BACKED) {
> + } else if (dio_flags & IOMAP_DIO_USER_BACKED) {
> bio_check_pages_dirty(bio);
> } else {
> bio_release_pages(bio, false);
> bio_put(bio);
> }
> +}
> +
> +static void __iomap_dio_bio_end_io(struct bio *bio, bool inline_completion)
> +{
> + struct iomap_dio *dio = bio->bi_private;
> +
> + iomap_dio_bio_release_pages(bio, dio->flags, !!dio->error);
>
> /* Do not touch bio below, we just gave up our reference. */
>
> @@ -398,6 +407,14 @@ static ssize_t iomap_dio_bio_iter_one(struct iomap_iter *iter,
> return ret;
> }
>
> +static inline unsigned int iomap_dio_alignment(struct inode *inode,
> + struct block_device *bdev, unsigned int dio_flags)
> +{
> + if (dio_flags & IOMAP_DIO_FSBLOCK_ALIGNED)
> + return i_blocksize(inode);
> + return bdev_logical_block_size(bdev);
> +}
> +
> static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> {
> const struct iomap *iomap = &iter->iomap;
> @@ -416,10 +433,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> * File systems that write out of place and always allocate new blocks
> * need each bio to be block aligned as that's the unit of allocation.
> */
> - if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
> - alignment = fs_block_size;
> - else
> - alignment = bdev_logical_block_size(iomap->bdev);
> + alignment = iomap_dio_alignment(inode, iomap->bdev, dio->flags);
>
> if ((pos | length) & (alignment - 1))
> return -EINVAL;
> @@ -891,12 +905,352 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> }
> EXPORT_SYMBOL_GPL(__iomap_dio_rw);
>
> +struct iomap_dio_simple_read {
> + struct kiocb *iocb;
> + size_t size;
> + unsigned int dio_flags;
> + atomic_t state;
> + union {
> + struct task_struct *waiter;
> + struct work_struct work;
> + };
> + /*
> + * Align @bio to a cacheline boundary so that, combined with the
> + * front_pad passed to bioset_init(), the bio sits at the start of
> + * a cacheline in memory returned by the (HWCACHE-aligned) bio
> + * slab. This keeps the hot fields block layer touches on submit
> + * and completion (bi_iter, bi_status, ...) within a single line.
> + */
> + struct bio bio ____cacheline_aligned_in_smp;
> +};
> +
> +static struct bio_set iomap_dio_simple_read_pool;
> +
> +/*
> + * In the async simple read path, we need to prevent bio_endio() from
> + * triggering iocb->ki_complete() before the submitter has returned
> + * -EIOCBQUEUED. Otherwise, the caller might free the iocb concurrently.
> + *
> + * We use a three-state rendezvous to synchronize the submitter and end_io:
> + *
> + * IOMAP_DIO_SIMPLE_SUBMITTING: Initial state set before submitting the bio.
> + *
> + * IOMAP_DIO_SIMPLE_QUEUED: The submitter has safely queued the IO and will
> + * return -EIOCBQUEUED. If end_io sees this state, it takes over and calls
> + * ki_complete().
> + *
> + * IOMAP_DIO_SIMPLE_DONE: end_io fired before the submitter finished the
> + * submit path. end_io sets this state and does nothing else. The submitter
> + * will see this state and handle the completion synchronously (bypassing
> + * ki_complete() and returning the actual result).
> + */
> +enum {
> + IOMAP_DIO_SIMPLE_SUBMITTING = 0,
> + IOMAP_DIO_SIMPLE_QUEUED,
> + IOMAP_DIO_SIMPLE_DONE,
> +};
> +
> +static ssize_t iomap_dio_simple_read_finish(struct kiocb *iocb,
> + struct bio *bio, ssize_t ret)
> +{
> + struct inode *inode = file_inode(iocb->ki_filp);
> + struct iomap_dio_simple_read *sr = bio->bi_private;
> +
> + if (likely(!ret)) {
> + ret = sr->size;
> + iocb->ki_pos += ret;
> + } else {
> + fserror_report_io(inode, FSERR_DIRECTIO_READ, iocb->ki_pos,
> + sr->size, ret, GFP_NOFS);
> + }
> +
> + iomap_dio_bio_release_pages(bio, sr->dio_flags, ret < 0);
> +
> + return ret;
> +}
> +
> +static ssize_t iomap_dio_simple_read_complete(struct kiocb *iocb,
> + struct bio *bio)
> +{
> + struct inode *inode = file_inode(iocb->ki_filp);
> + ssize_t ret;
> +
> + WRITE_ONCE(iocb->private, NULL);
> +
> + ret = iomap_dio_simple_read_finish(iocb, bio,
> + blk_status_to_errno(bio->bi_status));
> +
> + inode_dio_end(inode);
> + trace_iomap_dio_complete(iocb, ret < 0 ? ret : 0, ret > 0 ? ret : 0);
> + return ret;
> +}
> +
> +static void iomap_dio_simple_read_complete_work(struct work_struct *work)
> +{
> + struct iomap_dio_simple_read *sr =
> + container_of(work, struct iomap_dio_simple_read, work);
> + struct kiocb *iocb = sr->iocb;
> + ssize_t ret;
> +
> + ret = iomap_dio_simple_read_complete(iocb, &sr->bio);
> + iocb->ki_complete(iocb, ret);
> +}
> +
> +static void iomap_dio_simple_read_async_done(struct iomap_dio_simple_read *sr)
> +{
> + struct kiocb *iocb = sr->iocb;
> +
> + if (unlikely(sr->bio.bi_status)) {
> + struct inode *inode = file_inode(iocb->ki_filp);
> +
> + INIT_WORK(&sr->work, iomap_dio_simple_read_complete_work);
> + queue_work(inode->i_sb->s_dio_done_wq, &sr->work);
> + return;
> + }
> +
> + iomap_dio_simple_read_complete_work(&sr->work);
> +}
> +
> +static void iomap_dio_simple_read_end_io(struct bio *bio)
> +{
> + struct iomap_dio_simple_read *sr = bio->bi_private;
> +
> + if (sr->waiter) {
> + struct task_struct *waiter = sr->waiter;
> +
> + WRITE_ONCE(sr->waiter, NULL);
> + blk_wake_io_task(waiter);
> + return;
> + }
> +
> + if (likely(atomic_read(&sr->state) == IOMAP_DIO_SIMPLE_QUEUED) ||
> + atomic_cmpxchg(&sr->state, IOMAP_DIO_SIMPLE_SUBMITTING,
> + IOMAP_DIO_SIMPLE_DONE) == IOMAP_DIO_SIMPLE_QUEUED)
> + iomap_dio_simple_read_async_done(sr);
> +}
> +
> +static inline bool iomap_dio_simple_read_supported(struct kiocb *iocb,
> + struct iov_iter *iter, unsigned int dio_flags)
> +{
> + struct inode *inode = file_inode(iocb->ki_filp);
> + size_t count = iov_iter_count(iter);
> +
> + if (iov_iter_rw(iter) != READ)
> + return false;
> + if (!count)
> + return false;
> + /*
> + * Simple read is an optimization for small IO. Filter out large IO
> + * early as it's the most common case to fail for typical direct IO
> + * workloads.
> + */
> + if (count > inode->i_sb->s_blocksize)
> + return false;
> + if (dio_flags & (IOMAP_DIO_FORCE_WAIT | IOMAP_DIO_PARTIAL))
> + return false;
> + if (iocb->ki_pos + count > i_size_read(inode))
> + return false;
> +
> + return true;
> +}
> +
> +static ssize_t iomap_dio_simple_read(struct kiocb *iocb,
> + struct iov_iter *iter, const struct iomap_ops *ops,
> + void *private, unsigned int dio_flags)
> +{
> + struct inode *inode = file_inode(iocb->ki_filp);
> + size_t count = iov_iter_count(iter);
> + int nr_pages;
> + struct iomap_dio_simple_read *sr;
> + unsigned int alignment;
> + struct iomap_iter iomi = {
> + .inode = inode,
> + .pos = iocb->ki_pos,
> + .len = count,
> + .flags = IOMAP_DIRECT,
> + .private = private,
> + };
> + struct bio *bio;
> + bool wait_for_completion = is_sync_kiocb(iocb);
> + ssize_t ret;
> +
> + if (dio_flags & IOMAP_DIO_BOUNCE)
> + nr_pages = bio_iov_bounce_nr_vecs(iter, REQ_OP_READ);
> + else
> + nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS);
> +
> + if (iocb->ki_flags & IOCB_NOWAIT)
> + iomi.flags |= IOMAP_NOWAIT;
> +
> + ret = kiocb_write_and_wait(iocb, count);
> + if (ret)
> + return ret;
> +
> + inode_dio_begin(inode);
> +
> + ret = ops->iomap_begin(inode, iomi.pos, count, iomi.flags,
> + &iomi.iomap, &iomi.srcmap);
> + if (ret) {
> + inode_dio_end(inode);
> + return ret;
> + }
> +
> + if (iomi.iomap.type != IOMAP_MAPPED ||
> + iomi.iomap.offset > iomi.pos ||
> + iomi.iomap.offset + iomi.iomap.length < iomi.pos + count ||
> + (iomi.iomap.flags & IOMAP_F_INTEGRITY)) {
> + ret = -ENOTBLK;
> + goto out_iomap_end;
> + }
> +
> + alignment = iomap_dio_alignment(inode, iomi.iomap.bdev, dio_flags);
> + if ((iomi.pos | count) & (alignment - 1)) {
> + ret = -EINVAL;
> + goto out_iomap_end;
> + }
> +
> + if (!wait_for_completion && unlikely(!inode->i_sb->s_dio_done_wq)) {
> + ret = sb_init_dio_done_wq(inode->i_sb);
> + if (ret < 0)
> + goto out_iomap_end;
> + }
> +
> + trace_iomap_dio_rw_begin(iocb, iter, dio_flags, 0);
> +
> + if (user_backed_iter(iter))
> + dio_flags |= IOMAP_DIO_USER_BACKED;
> +
> + bio = bio_alloc_bioset(iomi.iomap.bdev, nr_pages,
> + REQ_OP_READ | REQ_SYNC | REQ_IDLE,
> + GFP_KERNEL, &iomap_dio_simple_read_pool);
> + sr = container_of(bio, struct iomap_dio_simple_read, bio);
> +
> + fscrypt_set_bio_crypt_ctx(bio, inode, iomi.pos, GFP_KERNEL);
> + sr->iocb = iocb;
> + sr->dio_flags = dio_flags;
> +
> + bio->bi_iter.bi_sector = iomap_sector(&iomi.iomap, iomi.pos);
> + bio->bi_ioprio = iocb->ki_ioprio;
> + bio->bi_private = sr;
> + bio->bi_end_io = iomap_dio_simple_read_end_io;
> +
> + if (dio_flags & IOMAP_DIO_BOUNCE)
> + ret = bio_iov_iter_bounce(bio, iter, count);
> + else
> + ret = bio_iov_iter_get_pages(bio, iter, alignment - 1);
> + if (unlikely(ret))
> + goto out_bio_put;
> +
> + if (bio->bi_iter.bi_size != count) {
> + iov_iter_revert(iter, bio->bi_iter.bi_size);
> + ret = -ENOTBLK;
> + goto out_bio_release_pages;
> + }
> +
> + sr->size = bio->bi_iter.bi_size;
> +
> + if ((dio_flags & IOMAP_DIO_USER_BACKED) &&
> + !(dio_flags & IOMAP_DIO_BOUNCE))
> + bio_set_pages_dirty(bio);
> +
> + if (iocb->ki_flags & IOCB_NOWAIT)
> + bio->bi_opf |= REQ_NOWAIT;
> + if ((iocb->ki_flags & IOCB_HIPRI) && !wait_for_completion) {
> + bio->bi_opf |= REQ_POLLED;
> + bio_set_polled(bio, iocb);
> + WRITE_ONCE(iocb->private, bio);
> + }
> +
> + if (wait_for_completion) {
> + sr->waiter = current;
> + blk_crypto_submit_bio(bio);
> + } else {
> + atomic_set(&sr->state, IOMAP_DIO_SIMPLE_SUBMITTING);
> + sr->waiter = NULL;
> + blk_crypto_submit_bio(bio);
> + ret = -EIOCBQUEUED;
> + }
> +
> + if (ops->iomap_end)
> + ops->iomap_end(inode, iomi.pos, count, count, iomi.flags,
> + &iomi.iomap);
> +
> + if (wait_for_completion) {
> + for (;;) {
> + set_current_state(TASK_UNINTERRUPTIBLE);
> + if (!READ_ONCE(sr->waiter))
> + break;
> + blk_io_schedule();
> + }
> + __set_current_state(TASK_RUNNING);
> +
> + ret = iomap_dio_simple_read_finish(iocb, bio,
> + blk_status_to_errno(bio->bi_status));
> + inode_dio_end(inode);
> + trace_iomap_dio_complete(iocb, ret < 0 ? ret : 0,
> + ret > 0 ? ret : 0);
> + } else if (atomic_cmpxchg(&sr->state, IOMAP_DIO_SIMPLE_SUBMITTING,
> + IOMAP_DIO_SIMPLE_QUEUED) ==
> + IOMAP_DIO_SIMPLE_DONE) {
> + ret = iomap_dio_simple_read_complete(iocb, bio);
> + } else {
> + trace_iomap_dio_rw_queued(inode, iomi.pos, count);
> + }
> +
> + return ret;
> +
> +out_bio_release_pages:
> + if (dio_flags & IOMAP_DIO_BOUNCE)
> + bio_iov_iter_unbounce(bio, true, false);
> + else
> + bio_release_pages(bio, false);
> +out_bio_put:
> + bio_put(bio);
> +out_iomap_end:
> + if (ops->iomap_end)
> + ops->iomap_end(inode, iomi.pos, count, 0, iomi.flags,
> + &iomi.iomap);
> + inode_dio_end(inode);
> + return ret;
> +}
> +
> ssize_t
> iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
> unsigned int dio_flags, void *private, size_t done_before)
> {
> struct iomap_dio *dio;
> + ssize_t ret;
> +
> + /*
> + * Fast path for small, block-aligned reads that map to a single
> + * contiguous on-disk extent.
> + *
> + * @dops must be NULL: a non-NULL @dops means the caller wants its
> + * ->end_io / ->submit_io hooks invoked, and in particular wants its
> + * bios to be allocated from the filesystem-private @dops->bio_set
> + * (whose front_pad sizes a filesystem-private wrapper around the
> + * bio). The fast path instead allocates from the shared
> + * iomap_dio_simple_read_pool, whose front_pad matches
> + * struct iomap_dio_simple_read; the two wrappers are not
> + * interchangeable, so we must fall back to __iomap_dio_rw() in
> + * that case.
> + *
> + * @done_before must be zero: a non-zero caller-accumulated residual
> + * cannot be carried through a single-bio inline completion.
> + *
> + * -ENOTBLK is the private sentinel returned by iomap_dio_simple_read()
> + * when it decides the request does not fit the fast path.
> + * In that case we proceed to the generic __iomap_dio_rw() slow
> + * path. Any other errno is a real result and is propagated as-is,
> + * in particular -EAGAIN for IOCB_NOWAIT must reach the caller.
> + */
> + if (!dops && !done_before &&
> + iomap_dio_simple_read_supported(iocb, iter, dio_flags)) {
> + ret = iomap_dio_simple_read(iocb, iter, ops, private, dio_flags);
> + if (ret != -ENOTBLK)
> + return ret;
> + }
>
> dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
> done_before);
> @@ -905,3 +1259,11 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> return iomap_dio_complete(dio);
> }
> EXPORT_SYMBOL_GPL(iomap_dio_rw);
> +
> +static int __init iomap_dio_init(void)
> +{
> + return bioset_init(&iomap_dio_simple_read_pool, 4,
> + offsetof(struct iomap_dio_simple_read, bio),
> + BIOSET_NEED_BVECS | BIOSET_PERCPU_CACHE);
> +}
> +fs_initcall(iomap_dio_init);
^ permalink raw reply
* Re: [PATCH v10 00/22] fs-verity support for XFS with post EOF merkle tree
From: Andrey Albershteyn @ 2026-05-21 9:42 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrey Albershteyn, linux-xfs, fsverity, linux-fsdevel, ebiggers,
linux-ext4, linux-f2fs-devel, linux-btrfs, linux-unionfs, djwong,
david
In-Reply-To: <20260521090705.GA14254@lst.de>
On 2026-05-21 11:07:05, Christoph Hellwig wrote:
> On Wed, May 20, 2026 at 02:36:58PM +0200, Andrey Albershteyn wrote:
> > This series based on v7.1-rc4.
>
> How are we going to merge this? It touches at three subsystem trees
> (fsverity, vfs/iomap, xfs) so some coordination will be needed.
As most of the patches are xfs, it's probably make sense to go
through xfs tree
Carlos, what do you think?
--
- Andrey
^ permalink raw reply
* Re: [PATCH v10 00/22] fs-verity support for XFS with post EOF merkle tree
From: Christoph Hellwig @ 2026-05-21 9:07 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: linux-xfs, fsverity, linux-fsdevel, ebiggers, hch, linux-ext4,
linux-f2fs-devel, linux-btrfs, linux-unionfs, djwong, david
In-Reply-To: <20260520123722.405752-1-aalbersh@kernel.org>
On Wed, May 20, 2026 at 02:36:58PM +0200, Andrey Albershteyn wrote:
> This series based on v7.1-rc4.
How are we going to merge this? It touches at three subsystem trees
(fsverity, vfs/iomap, xfs) so some coordination will be needed.
^ permalink raw reply
* [PATCH] ext4: convert legacy ext4_debug() to standard pr_debug()
From: lirongqing @ 2026-05-21 7:46 UTC (permalink / raw)
To: Theodore Ts'o, Andreas Dilger, Baokun Li, Jan Kara,
Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, linux-ext4, linux-kernel
Cc: Li RongQing
From: Li RongQing <lirongqing@baidu.com>
The ext4 file system historically implemented its own debug logging macro
ext4_debug() via EXT4FS_DEBUG conditional compilation. This legacy
implementation suffers from two major drawbacks:
1. It makes two consecutive un-serialized printk() calls, which can
lead to severe log interleaving and corruption under multi-core
concurrent workloads.
2. It completely bypasses the standard modern kernel dynamic debug
(CONFIG_DYNAMIC_DEBUG) infrastructure.
Clean up the legacy implementation by leveraging pr_debug(). This squashes
the multiple printk() calls into a single atomic execution, ensuring
log integrity, while seamlessly hooking ext4 into the kernel's native
dynamic debug framework.
The redundant __FILE__ and __LINE__ macros are intentionally removed from
the string format because the dynamic debug infrastructure can already
append them automatically at runtime (via the '+fl' flags) if desired.
This avoids redundancy and double-logging in modern production/debugging
environments while keeping the macro clean and robust against dangling
comma compiler errors.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
fs/ext4/ext4.h | 20 ++------------------
1 file changed, 2 insertions(+), 18 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 94283a9..39e86ff 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -62,24 +62,8 @@
*/
#define DOUBLE_CHECK__
-/*
- * Define EXT4FS_DEBUG to produce debug messages
- */
-#undef EXT4FS_DEBUG
-
-/*
- * Debug code
- */
-#ifdef EXT4FS_DEBUG
-#define ext4_debug(f, a...) \
- do { \
- printk(KERN_DEBUG "EXT4-fs DEBUG (%s, %d): %s:", \
- __FILE__, __LINE__, __func__); \
- printk(KERN_DEBUG f, ## a); \
- } while (0)
-#else
-#define ext4_debug(fmt, ...) no_printk(fmt, ##__VA_ARGS__)
-#endif
+#define ext4_debug(fmt, ...) \
+ pr_debug("EXT4-fs DEBUG %s: " fmt, __func__, ##__VA_ARGS__)
/*
* Turn on EXT_DEBUG to enable ext4_ext_show_path/leaf/move in extents.c
--
2.9.4
^ permalink raw reply related
* [syzbot] [mm?] [ext4?] BUG: unable to handle kernel NULL pointer dereference in qlist_free_all (10)
From: syzbot @ 2026-05-21 3:54 UTC (permalink / raw)
To: akpm, jannh, liam, linux-ext4, linux-kernel, linux-mm, ljs,
pfalcato, syzkaller-bugs, vbabka
Hello,
syzbot found the following issue on:
HEAD commit: df685633c3db Merge tag 'rcu-fixes.v7.1-20260519a' of git:/..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17beec2e580000
kernel config: https://syzkaller.appspot.com/x/.config?x=d0f0911eedbc130a
dashboard link: https://syzkaller.appspot.com/bug?extid=741fee3eb7f4c4e6992a
compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=110e6b06580000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=146b4c2e580000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/90074e46cb62/disk-df685633.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/691247547753/vmlinux-df685633.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e1c705a2acac/bzImage-df685633.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+741fee3eb7f4c4e6992a@syzkaller.appspotmail.com
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000007c8b6067 P4D 800000007c8b6067 PUD 0
Oops: Oops: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 7032 Comm: syz.0.224 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
RIP: 0010:qlink_to_cache mm/kasan/quarantine.c:131 [inline]
RIP: 0010:qlist_free_all+0x8d/0xf0 mm/kasan/quarantine.c:176
Code: 48 89 c2 48 c1 e2 06 48 03 15 1f 8a a7 0b 48 8b 42 08 48 89 c1 83 e1 01 48 83 e9 01 48 09 c8 48 21 d0 80 78 33 f5 49 0f 45 c5 <48> 8b 68 08 eb 88 48 83 7d 40 00 75 9b 66 f7 45 08 04 02 75 93 8b
RSP: 0018:ffffc9000654f428 EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff88802c2a8000 RCX: ffffffffffffffff
RDX: ffffea0000b0aa00 RSI: ffffffff8df1f2e9 RDI: ffff88802c2a8000
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88802c2a8000
R13: 0000000000000000 R14: ffffc9000654f458 R15: 0000000000000100
FS: 00007fa39d6a46c0(0000) GS:ffff888124374000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000000757fe000 CR4: 00000000003526f0
Call Trace:
<TASK>
kasan_quarantine_reduce+0x1a0/0x1f0 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x69/0x90 mm/kasan/common.c:350
kasan_slab_alloc include/linux/kasan.h:253 [inline]
slab_post_alloc_hook mm/slub.c:4569 [inline]
slab_alloc_node mm/slub.c:4898 [inline]
kmem_cache_alloc_noprof+0x241/0x6e0 mm/slub.c:4905
vm_area_alloc+0x1f/0x160 mm/vma_init.c:32
__mmap_new_vma mm/vma.c:2547 [inline]
__mmap_region+0x104d/0x2da0 mm/vma.c:2771
mmap_region+0x35d/0x620 mm/vma.c:2857
do_mmap+0xc63/0x12f0 mm/mmap.c:560
vm_mmap_pgoff+0x29e/0x470 mm/util.c:581
ksys_mmap_pgoff+0xe4/0x610 mm/mmap.c:606
__do_sys_mmap arch/x86/kernel/sys_x86_64.c:89 [inline]
__se_sys_mmap arch/x86/kernel/sys_x86_64.c:82 [inline]
__x64_sys_mmap+0x125/0x190 arch/x86/kernel/sys_x86_64.c:82
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x10b/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa39c79ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa39d6a4028 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
RAX: ffffffffffffffda RBX: 00007fa39ca15fa0 RCX: 00007fa39c79ce59
RDX: 00000000000000db RSI: 0000000004020009 RDI: 0000000000000000
RBP: 00007fa39c832d6f R08: 0000000000000401 R09: 0000000000008000
R10: 0000000000000eb1 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa39ca16038 R14: 00007fa39ca15fa0 R15: 00007fffb789fb58
</TASK>
Modules linked in:
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
RIP: 0010:qlink_to_cache mm/kasan/quarantine.c:131 [inline]
RIP: 0010:qlist_free_all+0x8d/0xf0 mm/kasan/quarantine.c:176
Code: 48 89 c2 48 c1 e2 06 48 03 15 1f 8a a7 0b 48 8b 42 08 48 89 c1 83 e1 01 48 83 e9 01 48 09 c8 48 21 d0 80 78 33 f5 49 0f 45 c5 <48> 8b 68 08 eb 88 48 83 7d 40 00 75 9b 66 f7 45 08 04 02 75 93 8b
RSP: 0018:ffffc9000654f428 EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff88802c2a8000 RCX: ffffffffffffffff
RDX: ffffea0000b0aa00 RSI: ffffffff8df1f2e9 RDI: ffff88802c2a8000
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88802c2a8000
R13: 0000000000000000 R14: ffffc9000654f458 R15: 0000000000000100
FS: 00007fa39d6a46c0(0000) GS:ffff888124374000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000000757fe000 CR4: 00000000003526f0
----------------
Code disassembly (best guess):
0: 48 89 c2 mov %rax,%rdx
3: 48 c1 e2 06 shl $0x6,%rdx
7: 48 03 15 1f 8a a7 0b add 0xba78a1f(%rip),%rdx # 0xba78a2d
e: 48 8b 42 08 mov 0x8(%rdx),%rax
12: 48 89 c1 mov %rax,%rcx
15: 83 e1 01 and $0x1,%ecx
18: 48 83 e9 01 sub $0x1,%rcx
1c: 48 09 c8 or %rcx,%rax
1f: 48 21 d0 and %rdx,%rax
22: 80 78 33 f5 cmpb $0xf5,0x33(%rax)
26: 49 0f 45 c5 cmovne %r13,%rax
* 2a: 48 8b 68 08 mov 0x8(%rax),%rbp <-- trapping instruction
2e: eb 88 jmp 0xffffffb8
30: 48 83 7d 40 00 cmpq $0x0,0x40(%rbp)
35: 75 9b jne 0xffffffd2
37: 66 f7 45 08 04 02 testw $0x204,0x8(%rbp)
3d: 75 93 jne 0xffffffd2
3f: 8b .byte 0x8b
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply
* [PATCH 2/2] fs: jbd2: use clear_and_wake_up_bit() in journal_end_buffer_io_sync()
From: Agatha Isabelle Moreira @ 2026-05-20 20:05 UTC (permalink / raw)
To: linux-ext4, linux-kernel, linux-fsdevel, Theodore Ts'o,
Jan Kara, shuo chen, linux-kernel-mentees, shuah, patch-reply
In-Reply-To: <ag4PEP52c8rxrYPc@guidai>
Use `clear_and_wake_up_bit()` in `journal_end_buffer_io_sync()`, since
the helper was introduced in 'commit 8236b0ae31c83 ("bdi: wake up
concurrent wb_shutdown() callers.")' as a generic way of doing the same
sequence of operations:
clear_bit_unlock();
smp_mb__after_atomic();
wake_up_bit();
The helper was first implemented to avoid bugs caused by forgetting to
call `wake_up_bit()` after `clear_bit_unlock()`.
Since `journal_end_buffer_io_sync()` was first introduced by 'commit
470decc613ab2 ("jbd2: initial copy of files from jbd")' and last
modified in this operation by 'commit 4e857c58efeb9 ("arch: Mass
conversion of smp_mb__*()")', years before `clear_and_wake_up_bit()`, it
still uses the open-coded sequence.
Replace the open-coded sequence with the helper to avoid duplicate code
and reduce code paths to maintain.
Suggested-by: shuo chen <1289151713@qq.com>
Link: https://lore.kernel.org/kernelnewbies/agzoqV835-co4kAN@guidai/T/#t
Signed-off-by: Agatha Isabelle Moreira <code@agatha.dev>
---
fs/jbd2/commit.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 8cf61e7185c4..b647fde76e49 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -39,9 +39,7 @@ static void journal_end_buffer_io_sync(struct buffer_head *bh, int uptodate)
else
clear_buffer_uptodate(bh);
if (orig_bh) {
- clear_bit_unlock(BH_Shadow, &orig_bh->b_state);
- smp_mb__after_atomic();
- wake_up_bit(&orig_bh->b_state, BH_Shadow);
+ clear_and_wake_up_bit(BH_Shadow, &orig_bh->b_state);
}
unlock_buffer(bh);
}
--
2.53.0
^ permalink raw reply related
* [PATCH 1/2] fs: buffer: use clear_and_wake_up_bit() in unlock_buffer()
From: Agatha Isabelle Moreira @ 2026-05-20 19:58 UTC (permalink / raw)
To: linux-fsdevel, linux-ext4, linux-kernel, Alexander Viro,
Christian Brauner, Jan Kara, shuo chen, linux-kernel-mentees,
shuah, patch-reply
In-Reply-To: <ag4PEP52c8rxrYPc@guidai>
Use `clear_and_wake_up_bit()` in `unlock_buffer()`, since the helper was
introduced in 'commit 8236b0ae31c83 ("bdi: wake up concurrent
wb_shutdown() callers.")' as a generic way of doing the same sequence of
operations:
clear_bit_unlock();
smp_mb__after_atomic();
wake_up_bit();
The helper was implemented to avoid bugs caused by forgetting to call
`wake_up_bit()` after `clear_bit_unlock()`.
Since `unlock_buffer()` predates git and was last modified in
'commit 4e857c58efeb9 ("arch: Mass conversion of smp_mb__*()")', years
before `clear_and_wake_up_bit()`, it still uses the open-coded sequence.
Replace the open-coded sequence with the helper to avoid duplicate code
and reduce code paths to maintain.
Suggested-by: shuo chen <1289151713@qq.com>
Link: https://lore.kernel.org/kernelnewbies/agzoqV835-co4kAN@guidai/T/#t
Signed-off-by: Agatha Isabelle Moreira <code@agatha.dev>
---
fs/buffer.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index b0b3792b1496..4348b240bd97 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -74,9 +74,7 @@ EXPORT_SYMBOL(__lock_buffer);
void unlock_buffer(struct buffer_head *bh)
{
- clear_bit_unlock(BH_Lock, &bh->b_state);
- smp_mb__after_atomic();
- wake_up_bit(&bh->b_state, BH_Lock);
+ clear_and_wake_up_bit(BH_Lock, &bh->b_state);
}
EXPORT_SYMBOL(unlock_buffer);
--
2.53.0
^ permalink raw reply related
* [PATCH 0/2] fs: refactor code to use clear_and_wake_up_bit()
From: Agatha Isabelle Moreira @ 2026-05-20 19:45 UTC (permalink / raw)
To: linux-fsdevel, linux-ext4, linux-kernel, Christian Brauner,
Jan Kara, shuo chen, Theodore Ts'o, linux-kernel-mentees,
shuah, patch-reply
Refactor code to use `clear_and_wake_up_bit()` instead of manual calls
to:
clear_bit_unlock();
smp_mb__after_atomic();
wake_up_bit();
The helper function `clear_and_wake_up_bit()` was introduced in
'commit 8236b0ae31c83 ("bdi: wake up concurrent wb_shutdown()
callers.")' as a generic way of doing the same sequence of operations,
but several pieces of code still remain.
Replace manual calls to the operations by a single call to
`clear_and_wake_up_bit()` to deduplicate code and standardize pathways.
TESTING
=======
Boot-tested on an x86_64 QEMU virtual machine. Basic filesystem
operations (create, delete, sync) were performed on an ext4 filesystem
with `data=journal` modes. No issues were observed.
Suggested-by: shuo chen <1289151713@qq.com>
Link: https://lore.kernel.org/kernelnewbies/agzoqV835-co4kAN@guidai/T/#t
Signed-off-by: Agatha Isabelle Moreira <code@agatha.dev>
---
Agatha Isabelle Moreira (2):
fs: buffer: use clear_and_wake_up_bit() in unlock_buffer()
fs: jbd2: use clear_and_wake_up_bit() in journal_end_buffer_io_sync()
fs/buffer.c | 4 +---
fs/jbd2/commit.c | 4 +---
2 files changed, 2 insertions(+), 6 deletions(-)
--
2.53.0
^ permalink raw reply
* Re: [PATCH v10 03/22] ovl: use core fsverity ensure info interface
From: Eric Biggers @ 2026-05-20 19:07 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: linux-xfs, fsverity, linux-fsdevel, hch, linux-ext4,
linux-f2fs-devel, linux-btrfs, linux-unionfs, djwong,
Amir Goldstein
In-Reply-To: <20260520123722.405752-4-aalbersh@kernel.org>
On Wed, May 20, 2026 at 02:37:01PM +0200, Andrey Albershteyn wrote:
> fsverity now exposes fsverity_ensure_verity_info() which could be used
> instead of opening file to ensure that fsverity info is loaded and
> attached to inode.
>
> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> Acked-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/overlayfs/util.c | 14 +++-----------
> 1 file changed, 3 insertions(+), 11 deletions(-)
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
I'm still confused by the new implementation of fsverity_active() that
got introduced by "fsverity: use a hashtable to find the fsverity_info",
though. I should have caught this during review of that commit. For
one its comment is outdated, but also the memory barrier seems to be
specific to the fsverity_get_info() caller and probably should be moved
to there. Anyway, that's not directly related to this patch.
- Eric
^ permalink raw reply
* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-20 17:30 UTC (permalink / raw)
To: Mark Brown
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <cdeaab82-06bf-47c1-8f6c-4e40dbec2344@sirena.org.uk>
On Wed, May 20, 2026, at 1:11 PM, Mark Brown wrote:
> On Wed, May 20, 2026 at 12:58:22PM -0400, Chuck Lever wrote:
>> The first option is the narrowest kernel-side change, and
>> matches what other minimal-fileattr filesystems do.
>
> That sounds like a good idea regardless of what we do with the test?
Yes, I have no objection to this approach, but it would be great to
hear from the vfat maintainers/contributors on this one before I
dig in.
--
Chuck Lever
^ permalink raw reply
* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Mark Brown @ 2026-05-20 17:11 UTC (permalink / raw)
To: Chuck Lever
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <858d7233-1d9c-48f4-aa4f-c5a9f6e1f5dc@app.fastmail.com>
[-- Attachment #1: Type: text/plain, Size: 1953 bytes --]
On Wed, May 20, 2026 at 12:58:22PM -0400, Chuck Lever wrote:
> On Wed, May 20, 2026, at 11:19 AM, Mark Brown wrote:
> > Yes, it's the only one showing as failing - there are four failures
> > correspoding to the four tests done for vfat.
> 03/15 adds .fileattr_get = fat_fileattr_get for both
> fat_file_inode_operations and vfat_dir_inode_operations. LTP
> opens a directory (SAFE_OPEN(TESTDIR, O_RDONLY|O_DIRECTORY)),
> so FS_IOC_GETFLAGS on the dir now succeeds, and statx04
> proceeds where it was previously skipped.
> AFAICS, 03/15 did not change pre-existing kernel behavior of
> stx_attributes_mask on vfat. It merely converted a "skipped"
> LTP outcome into an "executed but failed" outcome.
Ah, that's an interesting issue with the way the test reports. LTP
could use nested reports a la TAP here so we're not just seeing the top
level failure from the test case in automation.
> Fix options:
> * fat_getattr() could call generic_fill_statx_attr(inode, stat),
> which advertises KSTAT_ATTR_VFS_FLAGS (IMMUTABLE + APPEND).
> That clears 2 of 4 TFAILs but not COMPRESSED/NODUMP, which
> FAT genuinely does not back.
...
> * Admit the LTP statx04 test needs to be updated.
> FS_IOC_GETFLAGS succeeding does not logically imply all four
> FS_IOC_FLAGS-mapped STATX_ATTR_* bits are supported. The
> test's gate is too coarse for filesystems that gained a
> narrowly-scoped fileattr_get (just casefold/immutable). The
> test's tag list pins it to filesystems that do support the
> full set, but vfat was tacitly excluded by the prior ENOTTY.
I think this is needed, it's hardly the first LTP test to make
unwarranted assumptions about the kernel APIs. I'll try to look into
it.
> The first option is the narrowest kernel-side change, and
> matches what other minimal-fileattr filesystems do.
That sounds like a good idea regardless of what we do with the test?
Thanks for looking into this so quickly and thoroughly.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-20 16:58 UTC (permalink / raw)
To: Mark Brown
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <3a347b64-f91b-450f-b27d-26ea6810b960@sirena.org.uk>
On Wed, May 20, 2026, at 11:19 AM, Mark Brown wrote:
> On Wed, May 20, 2026 at 11:12:51AM -0400, Chuck Lever wrote:
>> On Wed, May 20, 2026, at 10:54 AM, Mark Brown wrote:
>
>> > It's not testing tmpfs (well, it does but that passed), as the log above
>> > shows it is making a vfat filesystem on a loop device backed by a file
>> > that happens to be in a tmpfs and then testing that. There's a bunch of
>> > filesystems covered in this manner:
>
>> OK. Is vfat the only failure in LTP statx04 ?
>
> Yes, it's the only one showing as failing - there are four failures
> correspoding to the four tests done for vfat.
03/15 adds .fileattr_get = fat_fileattr_get for both
fat_file_inode_operations and vfat_dir_inode_operations. LTP
opens a directory (SAFE_OPEN(TESTDIR, O_RDONLY|O_DIRECTORY)),
so FS_IOC_GETFLAGS on the dir now succeeds, and statx04
proceeds where it was previously skipped.
AFAICS, 03/15 did not change pre-existing kernel behavior of
stx_attributes_mask on vfat. It merely converted a "skipped"
LTP outcome into an "executed but failed" outcome.
Fix options:
* fat_getattr() could call generic_fill_statx_attr(inode, stat),
which advertises KSTAT_ATTR_VFS_FLAGS (IMMUTABLE + APPEND).
That clears 2 of 4 TFAILs but not COMPRESSED/NODUMP, which
FAT genuinely does not back.
* Set stat->attributes_mask |= KSTAT_ATTR_FS_IOC_FLAGS in
fat_getattr(). Honest only to the extent that FAT now exposes
some FS_*_FL bits via fileattr. This would silence the test
failures, but advertises capabilities (COMPRESSED, NODUMP)
FAT doesn't track.
* Admit the LTP statx04 test needs to be updated.
FS_IOC_GETFLAGS succeeding does not logically imply all four
FS_IOC_FLAGS-mapped STATX_ATTR_* bits are supported. The
test's gate is too coarse for filesystems that gained a
narrowly-scoped fileattr_get (just casefold/immutable). The
test's tag list pins it to filesystems that do support the
full set, but vfat was tacitly excluded by the prior ENOTTY.
The first option is the narrowest kernel-side change, and
matches what other minimal-fileattr filesystems do.
--
Chuck Lever
^ permalink raw reply
* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Mark Brown @ 2026-05-20 15:19 UTC (permalink / raw)
To: Chuck Lever
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <8b750b3f-4d73-41f3-84fb-6e387fd24168@app.fastmail.com>
[-- Attachment #1: Type: text/plain, Size: 801 bytes --]
On Wed, May 20, 2026 at 11:12:51AM -0400, Chuck Lever wrote:
> On Wed, May 20, 2026, at 10:54 AM, Mark Brown wrote:
> > It's not testing tmpfs (well, it does but that passed), as the log above
> > shows it is making a vfat filesystem on a loop device backed by a file
> > that happens to be in a tmpfs and then testing that. There's a bunch of
> > filesystems covered in this manner:
> OK. Is vfat the only failure in LTP statx04 ?
Yes, it's the only one showing as failing - there are four failures
correspoding to the four tests done for vfat. It's only testing a
subset of filesystems (a combination of what the test knows about and
what's available at runtime with the kernel and rootfs.
Like I say there's a full log available at:
https://lava.sirena.org.uk/scheduler/job/2778994#L6373
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-20 15:12 UTC (permalink / raw)
To: Mark Brown
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <a366645c-364d-4588-8a15-4cd446f64366@sirena.org.uk>
On Wed, May 20, 2026, at 10:54 AM, Mark Brown wrote:
> On Wed, May 20, 2026 at 10:39:16AM -0400, Chuck Lever wrote:
>> On Wed, May 20, 2026, at 10:31 AM, Mark Brown wrote:
>> > On Thu, May 07, 2026 at 04:52:56AM -0400, Chuck Lever wrote:
>
>> > I'm seeing a regression in -next with the LTP statx04 test which bisects
>> > to this commit:
>
>> > tst_tmpdir.c:316: TINFO: Using /tmp/LTP_sta8hUyB4 as tmpdir (tmpfs
>> > filesystem)
>> > tst_device.c:98: TINFO: Found free device 0 '/dev/loop0'
>> > tst_test.c:2047: TINFO: LTP version: 20260130
>> > tst_test.c:2050: TINFO: Tested kernel: 7.1.0-rc4-next-20260520 #1 SMP
>> > PREEMPT @1779279361 aarch64
>
>> > ...
>
>> > tst_test.c:1985: TINFO: === Testing on vfat ===
>> > tst_test.c:1290: TINFO: Formatting /dev/loop0 with vfat opts='' extra
>> > opts=''
>> > tst_test.c:1302: TINFO: Mounting /dev/loop0 to
>> > /tmp/LTP_sta8hUyB4/mntpoint fstyp=vfat flags=0
>> > statx04.c:121: TFAIL: STATX_ATTR_COMPRESSED not supported
>> > statx04.c:121: TFAIL: STATX_ATTR_APPEND not supported
>> > statx04.c:121: TFAIL: STATX_ATTR_IMMUTABLE not supported
>> > statx04.c:121: TFAIL: STATX_ATTR_NODUMP not supported
>
>> At first blush, that does not seem like a plausible bisect
>> result. This commit shouldn't affect the behavior of tmpfs
>> in any way.
>
> It's not testing tmpfs (well, it does but that passed), as the log above
> shows it is making a vfat filesystem on a loop device backed by a file
> that happens to be in a tmpfs and then testing that. There's a bunch of
> filesystems covered in this manner:
>
> tst_test.c:1985: TINFO: === Testing on ext2 ===
> tst_test.c:1985: TINFO: === Testing on ext3 ===
> tst_test.c:1985: TINFO: === Testing on ext4 ===
> tst_test.c:1985: TINFO: === Testing on btrfs ===
> tst_test.c:1985: TINFO: === Testing on vfat ===
> tst_test.c:1985: TINFO: === Testing on tmpfs ===
OK. Is vfat the only failure in LTP statx04 ?
--
Chuck Lever
^ permalink raw reply
* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Mark Brown @ 2026-05-20 14:54 UTC (permalink / raw)
To: Chuck Lever
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <04302551-3628-4036-9a3f-596cb782f5b7@app.fastmail.com>
[-- Attachment #1: Type: text/plain, Size: 1800 bytes --]
On Wed, May 20, 2026 at 10:39:16AM -0400, Chuck Lever wrote:
> On Wed, May 20, 2026, at 10:31 AM, Mark Brown wrote:
> > On Thu, May 07, 2026 at 04:52:56AM -0400, Chuck Lever wrote:
> > I'm seeing a regression in -next with the LTP statx04 test which bisects
> > to this commit:
> > tst_tmpdir.c:316: TINFO: Using /tmp/LTP_sta8hUyB4 as tmpdir (tmpfs
> > filesystem)
> > tst_device.c:98: TINFO: Found free device 0 '/dev/loop0'
> > tst_test.c:2047: TINFO: LTP version: 20260130
> > tst_test.c:2050: TINFO: Tested kernel: 7.1.0-rc4-next-20260520 #1 SMP
> > PREEMPT @1779279361 aarch64
> > ...
> > tst_test.c:1985: TINFO: === Testing on vfat ===
> > tst_test.c:1290: TINFO: Formatting /dev/loop0 with vfat opts='' extra
> > opts=''
> > tst_test.c:1302: TINFO: Mounting /dev/loop0 to
> > /tmp/LTP_sta8hUyB4/mntpoint fstyp=vfat flags=0
> > statx04.c:121: TFAIL: STATX_ATTR_COMPRESSED not supported
> > statx04.c:121: TFAIL: STATX_ATTR_APPEND not supported
> > statx04.c:121: TFAIL: STATX_ATTR_IMMUTABLE not supported
> > statx04.c:121: TFAIL: STATX_ATTR_NODUMP not supported
> At first blush, that does not seem like a plausible bisect
> result. This commit shouldn't affect the behavior of tmpfs
> in any way.
It's not testing tmpfs (well, it does but that passed), as the log above
shows it is making a vfat filesystem on a loop device backed by a file
that happens to be in a tmpfs and then testing that. There's a bunch of
filesystems covered in this manner:
tst_test.c:1985: TINFO: === Testing on ext2 ===
tst_test.c:1985: TINFO: === Testing on ext3 ===
tst_test.c:1985: TINFO: === Testing on ext4 ===
tst_test.c:1985: TINFO: === Testing on btrfs ===
tst_test.c:1985: TINFO: === Testing on vfat ===
tst_test.c:1985: TINFO: === Testing on tmpfs ===
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox