Linux EXT4 FS development

Linux EXT4 FS development
 help / color / mirror / Atom feed

* Re: [PATCH v3 v3 2/2] ext4: allow clearing mballoc stats through mb_stats
From: Theodore Tso @ 2026-04-23 16:19 UTC (permalink / raw)
  To: Baolin Liu
  Cc: adilger.kernel, ojaswin, ritesh.list, yi.zhang, linux-ext4,
	linux-kernel, wangguanyu, Baolin Liu, Andreas Dilger
In-Reply-To: <20260422015026.7170-3-liubaolin12138@163.com>

On Wed, Apr 22, 2026 at 09:50:25AM +0800, Baolin Liu wrote:
> From: Baolin Liu <liubaolin@kylinos.cn>
> 
> Make /proc/fs/ext4/<dev>/mb_stats writable and clear the runtime
> mballoc statistics when 0 is written.

At the moment to enable mb_stats the system administrator needs to
write "1" to /sys/fs/ext4/<dev>/mb_stats, and writing "0" to the sysfs
file will pauce the statistics colleciton (but not clear the
statistics).  Adding a way to clear the statistics by writing to the
procfs file might be confusing to users.

So.... as a suggestion, if you're adding to the ability to write to
/proc/fs/.../mb_stats, what if we make things work by

   * Write 1 to /proc/fs/.../mb_stats to  enable statistics collection
   * Write 0 to /proc/fs/.../mb_stats to  disable statistics collection
   * Write -1 to /proc/fs/.../mb_stats to clear statistics counters

And then deprecate the /sys/fs/.../mb_stats variable (but we probably
won't be able to remove it for at least a year or two).

	       	       	  	    	- Ted

^ permalink raw reply

* [BUG] ext4: KCSAN: lockless i_es_all_nr reads in es_shrinker_info
From: Shuhao Fu @ 2026-04-23 17:37 UTC (permalink / raw)
  To: Theodore Ts'o, linux-ext4; +Cc: linux-kernel

Hi,

Reading /proc/fs/ext4/<sb>/es_shrinker_info can overlap with extent-status
updates and trigger KCSAN reports on the per-inode ES counters (I saw this on
i_es_all_nr; i_es_shk_nr is read the same way in this proc path). From what I
can see, the user-visible impact appears limited to stale/inconsistent procfs
stats output (I do not have evidence of corruption or crash from this path).

I reproduced this on a local KCSAN-instrumented tree based on linux commit
d8a9a4b11a13, using an x86_64 QEMU workload with userspace reader/writer loops.
To increase the race window, I added small debug-only hooks in my local tree:
after the writer updates the counter, it briefly delays and records which inode
it just touched; the proc reader then samples that inode's counters during the
s_es_list walk. I also wrapped the i_es_all_nr load in a local helper
ext4_es_shrinker_read_all_nr() so the read-side stack has a stable symbol;
upstream reads happen directly in ext4_seq_es_shrinker_info_show().

With that setup, KCSAN prints the following summary line (naming the two
racing functions):

  BUG: KCSAN: data-race in ext4_es_init_extent / ext4_es_shrinker_read_all_nr

The first clean hit in my local log was:

  read to 0xffff917cc15222c8 of 4 bytes by task 107 on cpu 0:
   ext4_es_shrinker_read_all_nr+0x26/0x50
   ext4_es_kcsan_probe_hot_inode+0x2b9/0x400
   ext4_seq_es_shrinker_info_show+0x9b/0xd40
   ...
   __x64_sys_sendfile64+0xc2/0x100
   do_syscall_64+0x13f/0x3c0

  write (reordered) to 0xffff917cc15222c8 of 4 bytes by task 108 on cpu 2:
   ext4_es_init_extent+0x6aa/0xa00
   __es_insert_extent+0x477/0xaa0
   ...
   ext4_do_fallocate+0x127/0x310
   __x64_sys_fallocate+0x75/0xb0

I then saw the same pair again later in the same run (for example around
129.529391 and 129.579938), still on the same 4-byte address.

It looks like i_es_all_nr and i_es_shk_nr are documented as protected by
i_es_lock, and writers update them under i_es_lock, but
ext4_seq_es_shrinker_info_show() reads them while walking the list under
s_es_lock (the list lock), not i_es_lock.

The reproducer shape from normal userspace APIs is one reader loop running
cat /proc/fs/ext4/<sb>/es_shrinker_info while a writer loop runs fallocate,
buffered writes, punch-hole, and truncate on the same filesystem.

Since this appears to be an observational procfs stats path, would you prefer
marking these loads with data_race(...) so the intentionally approximate reads
are explicit and this path stops generating repeated KCSAN warnings?

The rough change I had in mind is:

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index ... .. ...
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@
 int ext4_seq_es_shrinker_info_show(struct seq_file *seq, void *v)
 {
 	...
 	list_for_each_entry(ei, &sbi->s_es_list, i_es_list) {
 		inode_cnt++;
 		ei_all_nr = data_race(ei->i_es_all_nr);
 		ei_shk_nr = data_race(ei->i_es_shk_nr);
 		...
  }

If this direction is preferred, I can send a formal patch.

Thanks,
Shuhao

^ permalink raw reply

* Re: [PATCH v10 10/17] cifs: Implement fileattr_get for case sensitivity
From: Steve French @ 2026-04-23 21:02 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Al Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-ext4,
	linux-xfs, linux-cifs, linux-nfs, linux-api, linux-f2fs-devel,
	hirofumi, linkinjeon, sj1557.seo, yuezhang.mo,
	almaz.alexandrovich, slava, glaubitz, frank.li, tytso,
	adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Steve French
In-Reply-To: <20260423-case-sensitivity-v10-10-c385d674a6cf@oracle.com>

Should this also be checking if the SMB3.1.1 Linux Extensions or SMB
POSIX Extensions are negotiated (ie the server supports case
sensitivity)?

On Thu, Apr 23, 2026 at 8:20 AM Chuck Lever <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Upper layers such as NFSD need a way to query whether a filesystem
> handles filenames in a case-sensitive manner. Report CIFS/SMB case
> handling behavior via the FS_XFLAG_CASEFOLD flag.
>
> CIFS servers (typically Windows or Samba) are usually case-insensitive
> but case-preserving, meaning they ignore case during lookups but store
> filenames exactly as provided.
>
> The implementation reports case sensitivity based on the nocase mount
> option, which reflects whether the client expects the server to perform
> case-insensitive comparisons. When nocase is set, the mount is reported
> as case-insensitive.
>
> The callback is registered in all three inode_operations structures
> (directory, file, and symlink) to ensure consistent reporting across
> all inode types.
>
> Acked-by: Steve French <stfrench@microsoft.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/smb/client/cifsfs.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
> index 2025739f070a..9b70ffa3e01d 100644
> --- a/fs/smb/client/cifsfs.c
> +++ b/fs/smb/client/cifsfs.c
> @@ -30,6 +30,7 @@
>  #include <linux/xattr.h>
>  #include <linux/mm.h>
>  #include <linux/key-type.h>
> +#include <linux/fileattr.h>
>  #include <uapi/linux/magic.h>
>  #include <net/ipv6.h>
>  #include "cifsfs.h"
> @@ -1199,6 +1200,22 @@ struct file_system_type smb3_fs_type = {
>  MODULE_ALIAS_FS("smb3");
>  MODULE_ALIAS("smb3");
>
> +static int cifs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
> +{
> +       struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb);
> +       struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
> +
> +       /*
> +        * The nocase mount option installs case-insensitive dentry
> +        * operations on this superblock. SMB preserves case on the
> +        * wire and at rest, so the mount matches FS_XFLAG_CASEFOLD
> +        * semantics: case-folded lookup, verbatim storage.
> +        */
> +       if (tcon->nocase)
> +               fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
> +       return 0;
> +}
> +
>  const struct inode_operations cifs_dir_inode_ops = {
>         .create = cifs_create,
>         .atomic_open = cifs_atomic_open,
> @@ -1217,6 +1234,7 @@ const struct inode_operations cifs_dir_inode_ops = {
>         .listxattr = cifs_listxattr,
>         .get_acl = cifs_get_acl,
>         .set_acl = cifs_set_acl,
> +       .fileattr_get = cifs_fileattr_get,
>  };
>
>  const struct inode_operations cifs_file_inode_ops = {
> @@ -1227,6 +1245,7 @@ const struct inode_operations cifs_file_inode_ops = {
>         .fiemap = cifs_fiemap,
>         .get_acl = cifs_get_acl,
>         .set_acl = cifs_set_acl,
> +       .fileattr_get = cifs_fileattr_get,
>  };
>
>  const char *cifs_get_link(struct dentry *dentry, struct inode *inode,
> @@ -1261,6 +1280,7 @@ const struct inode_operations cifs_symlink_inode_ops = {
>         .setattr = cifs_setattr,
>         .permission = cifs_permission,
>         .listxattr = cifs_listxattr,
> +       .fileattr_get = cifs_fileattr_get,
>  };
>
>  /*
>
> --
> 2.53.0
>
>


-- 
Thanks,

Steve

^ permalink raw reply

* Re: [PATCH v3 v3 1/2] ext4: add blocks_allocated to mb_stats output
From: Baokun Li @ 2026-04-24  2:35 UTC (permalink / raw)
  To: Baolin Liu
  Cc: tytso, adilger.kernel, ojaswin, ritesh.list, yi.zhang, linux-ext4,
	linux-kernel, wangguanyu, Baolin Liu, Andreas Dilger
In-Reply-To: <20260422015026.7170-2-liubaolin12138@163.com>



On 2026/4/22 09:50, Baolin Liu wrote:
> From: Baolin Liu <liubaolin@kylinos.cn>
>
> Add blocks_allocated to /proc/fs/ext4/<dev>/mb_stats so that the
> reported statistics match the mballoc summary printed at unmount time.
>
> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>

Looks good, feel free to add:

Reviewed-by: Baokun Li <libaokun@linux.alibaba.com>

> ---
>  fs/ext4/mballoc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 20e9fdaf4301..1e13ef62cb9d 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3211,6 +3211,8 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset)
>  			"\tTo enable, please write \"1\" to sysfs file mb_stats.\n");
>  		return 0;
>  	}
> +	seq_printf(seq, "\tblocks_allocated: %u\n",
> +		   atomic_read(&sbi->s_bal_allocated));
>  	seq_printf(seq, "\treqs: %u\n", atomic_read(&sbi->s_bal_reqs));
>  	seq_printf(seq, "\tsuccess: %u\n", atomic_read(&sbi->s_bal_success));
>  


^ permalink raw reply

* Re: [PATCH v3 v3 2/2] ext4: allow clearing mballoc stats through mb_stats
From: Baokun Li @ 2026-04-24  3:12 UTC (permalink / raw)
  To: Theodore Tso, Baolin Liu
  Cc: adilger.kernel, ojaswin, ritesh.list, yi.zhang, linux-ext4,
	linux-kernel, wangguanyu, Baolin Liu, Andreas Dilger
In-Reply-To: <20260423161947.GB68318@macsyma-wired.lan>


On 2026/4/24 00:19, Theodore Tso wrote:
> On Wed, Apr 22, 2026 at 09:50:25AM +0800, Baolin Liu wrote:
>> From: Baolin Liu <liubaolin@kylinos.cn>
>>
>> Make /proc/fs/ext4/<dev>/mb_stats writable and clear the runtime
>> mballoc statistics when 0 is written.
> At the moment to enable mb_stats the system administrator needs to
> write "1" to /sys/fs/ext4/<dev>/mb_stats, and writing "0" to the sysfs
> file will pauce the statistics colleciton (but not clear the
> statistics).  Adding a way to clear the statistics by writing to the
> procfs file might be confusing to users.
>
> So.... as a suggestion, if you're adding to the ability to write to
> /proc/fs/.../mb_stats, what if we make things work by
>
>    * Write 1 to /proc/fs/.../mb_stats to  enable statistics collection
>    * Write 0 to /proc/fs/.../mb_stats to  disable statistics collection
>    * Write -1 to /proc/fs/.../mb_stats to clear statistics counters
>
> And then deprecate the /sys/fs/.../mb_stats variable (but we probably
> won't be able to remove it for at least a year or two).

I like this idea. Consolidating everything into /proc/fs/.../mb_stats
and using -1 to reset the counters is much cleaner than the current
split approach.


Cheers,
Baokun


^ permalink raw reply

* Re: [PATCH v3 v3 2/2] ext4: allow clearing mballoc stats through mb_stats
From: liubaolin @ 2026-04-24  8:09 UTC (permalink / raw)
  To: Theodore Tso, libaokun
  Cc: adilger.kernel, ojaswin, ritesh.list, yi.zhang, linux-ext4,
	linux-kernel, wangguanyu, Baolin Liu, Andreas Dilger
In-Reply-To: <20260423161947.GB68318@macsyma-wired.lan>



在 2026/4/24 0:19, Theodore Tso 写道:
> On Wed, Apr 22, 2026 at 09:50:25AM +0800, Baolin Liu wrote:
>> From: Baolin Liu <liubaolin@kylinos.cn>
>>
>> Make /proc/fs/ext4/<dev>/mb_stats writable and clear the runtime
>> mballoc statistics when 0 is written.
> 
> At the moment to enable mb_stats the system administrator needs to
> write "1" to /sys/fs/ext4/<dev>/mb_stats, and writing "0" to the sysfs
> file will pauce the statistics colleciton (but not clear the
> statistics).  Adding a way to clear the statistics by writing to the
> procfs file might be confusing to users.
> 
> So.... as a suggestion, if you're adding to the ability to write to
> /proc/fs/.../mb_stats, what if we make things work by
> 
>     * Write 1 to /proc/fs/.../mb_stats to  enable statistics collection
>     * Write 0 to /proc/fs/.../mb_stats to  disable statistics collection
>     * Write -1 to /proc/fs/.../mb_stats to clear statistics counters
> 
> And then deprecate the /sys/fs/.../mb_stats variable (but we probably
> won't be able to remove it for at least a year or two).
> 
> 	       	       	  	    	- Ted
Dear Ted, Baokun,
    Thank you for your review and suggestions.
    Since you mentioned that /sys/fs/.../mb_stats cannot be deleted in 
the short term,
    I plan to modify and submit a v4 patch according to the following 
strategy.

    1. Change `/proc/fs/.../mb_stats` to read-write mode.
	* Read `/proc/fs/.../mb_stats` to show statistics counters.
	* Write 0 to `/proc/fs/.../mb_stats` to disable statistics collection.
	* Write 1 to `/proc/fs/.../mb_stats` to enable statistics collection.
	* Write 2 to `/proc/fs/.../mb_stats` to clear statistics counters.

    2. Do not delete the `/sys/fs/.../mb_stats` node for now; implement 
the same write control logic.
	* Write 0 to `/sys/fs/.../mb_stats` to disable statistics collection.
	* Write 1 to `/sys/fs/.../mb_stats` to enable statistics collection.
	* Write 2 to `/sys/fs/.../mb_stats` to clear statistics counters.

	Delete `/sys/fs/.../mb_stats` later when it is possible to delete it.

    3. Modify the relevant documentation for `mb_stats`.
	Documentation/ABI/testing/sysfs-fs-ext4
	Documentation/admin-guide/ext4.rst
	Documentation/filesystems/proc.rst

    Compared to your suggestion, I recommend using the value 2 for the 
clear operation because s_mb_stats is an unsigned int variable, and 
using -1 requires changing the variable type.
    I suggest avoiding changing the s_mb_stats variable type unless 
absolutely necessary.

    Do you think this modification is appropriate?
    If there are no problems, I will start modifying the code and submit 
the v4 patch as soon as possible.


^ permalink raw reply

* [PATCH v2] generic/790: test post-EOF gap zeroing persistence
From: Zhang Yi @ 2026-04-24  9:22 UTC (permalink / raw)
  To: fstests, zlang
  Cc: linux-ext4, linux-fsdevel, bfoster, jack, yi.zhang, yi.zhang,
	yizhang089, yangerkun

From: Zhang Yi <yi.zhang@huawei.com>

Test that extending a file past a non-block-aligned EOF correctly
zero-fills the gap [old_EOF, block_boundary), and that this zeroing
persists through a filesystem shutdown+remount cycle.

Stale data beyond EOF can persist on disk when append write data blocks
are flushed before the on-disk file size update, or when concurrent
append writeback and mmap writes persist non-zero data past EOF.
Subsequent post-EOF operations (append write, fallocate, truncate up)
must zero-fill and persist the gap to prevent exposing stale data.

The test pollutes the file's last physical block (via FIEMAP + raw
device write) with a sentinel pattern beyond i_size, then performs each
extend operation and verifies the gap is zeroed both in memory and on
disk.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
v1->v2:
 - Add _require_no_realtime to prevent testing on XFS realtime devices,
   where file data may reside on $SCRATCH_RTDEV.
 - Add _exclude_fs btrfs since FIEMAP returns logical addresses, not
   physical device offsets, writing to these offsets on $SCRATCH_DEV
   would corrupt the filesystem in multi-device setups. Besides, since
   btrfs doesn't support shutdown right now, we can support it later.
 - Add -v flag to od in _check_gap_zero() to prevent line folding of
   identical consecutive lines.
 - Add expected_new_sz parameter to _test_eof_zeroing(), verify file
   size was not rolled back after shutdown+remount cycle, and also drop
   the unnecessary file size check before the shutdown as well.
 - Clarify the comment regarding when stale data beyond EOF can persist.

 tests/generic/790     | 164 ++++++++++++++++++++++++++++++++++++++++++
 tests/generic/790.out |   4 ++
 2 files changed, 168 insertions(+)
 create mode 100755 tests/generic/790
 create mode 100644 tests/generic/790.out

diff --git a/tests/generic/790 b/tests/generic/790
new file mode 100755
index 00000000..2adc06f8
--- /dev/null
+++ b/tests/generic/790
@@ -0,0 +1,164 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2026 Huawei.  All Rights Reserved.
+#
+# FS QA Test No. 790
+#
+# Test that extending a file past a non-block-aligned EOF correctly zero-fills
+# the gap [old_EOF, block_boundary), and that this zeroing persists through a
+# filesystem shutdown+remount cycle.
+#
+# Stale data beyond EOF can persist on disk when:
+# 1) append write data blocks are flushed before the on-disk file size update,
+#    and the system crashes in this window.
+# 2) concurrent append writeback and mmap writes persist non-zero data past EOF.
+#
+# Subsequent post-EOF operations (append write, fallocate, truncate up) must
+# zero-fill and persist the gap to prevent exposing stale data.
+#
+# The test pollutes the file's last physical block (via FIEMAP + raw device
+# write) with a sentinel pattern beyond i_size, then performs each extend
+# operation and verifies the gap is zeroed both in memory and on disk.
+#
+. ./common/preamble
+_begin_fstest auto quick rw shutdown
+
+. ./common/filter
+
+_require_scratch
+_require_block_device $SCRATCH_DEV
+_require_no_realtime
+_require_scratch_shutdown
+_require_metadata_journaling $SCRATCH_DEV
+
+# FIEMAP on Btrfs returns logical addresses within the filesystem's address
+# space, not physical device offsets. Writing to these offsets on $SCRATCH_DEV
+# would corrupt the filesystem in multi-device setups.
+_exclude_fs btrfs
+
+_require_xfs_io_command "fiemap"
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "pwrite"
+_require_xfs_io_command "truncate"
+_require_xfs_io_command "sync_range"
+
+# Check that gap region [offset, offset+nbytes) is entirely zero
+_check_gap_zero()
+{
+	local file="$1"
+	local offset="$2"
+	local nbytes="$3"
+	local label="$4"
+	local data
+	local stripped
+
+	data=$(od -A n -t x1 -v -j $offset -N $nbytes "$file" 2>/dev/null)
+
+	# Remove whitespace and check if any byte is non-zero
+	stripped=$(printf '%s' "$data" | tr -d ' \n\t')
+	if [ -n "$stripped" ] && ! echo "$stripped" | grep -qE "^0+$"; then
+		echo "FAIL: non-zero data in gap [$offset,$((offset + nbytes))) $label"
+		_hexdump -N $((offset + nbytes)) "$file"
+		return 1
+	fi
+	return 0
+}
+
+# Get the physical block offset (in bytes) of the file's first block on device
+_get_phys_offset()
+{
+	local file="$1"
+	local fiemap_output
+	local phys_blk
+
+	fiemap_output=$($XFS_IO_PROG -r -c "fiemap -v" "$file" 2>/dev/null)
+	phys_blk=$(echo "$fiemap_output" | _filter_xfs_io_fiemap | head -1 | awk '{print $3}')
+	if [ -z "$phys_blk" ]; then
+		echo ""
+		return
+	fi
+	# Convert 512-byte blocks to bytes
+	echo $((phys_blk * 512))
+}
+
+_test_eof_zeroing()
+{
+	local test_name="$1"
+	local extend_cmd="$2"
+	local expected_new_sz="$3"
+	local file=$SCRATCH_MNT/testfile_${test_name}
+
+	echo "$test_name" | tee -a $seqres.full
+
+	# Compute non-block-aligned EOF offset
+	local gap_bytes=16
+	local eof_offset=$((blksz - gap_bytes))
+
+	# Step 1: Write one full block to ensure the filesystem allocates a
+	#         physical block for the file instead of using inline data.
+	$XFS_IO_PROG -f -c "pwrite -S 0x5a 0 $blksz" -c fsync \
+		"$file" >> $seqres.full 2>&1
+
+	# Step 2: Get physical block offset on device via FIEMAP
+	local phys_offset
+	phys_offset=$(_get_phys_offset "$file")
+	if [ -z "$phys_offset" ]; then
+		_fail "$test_name: failed to get physical block offset via fiemap"
+	fi
+
+	# Step 3: Truncate file to non-block-aligned size and fsync.
+	#         The on-disk region [eof_offset, blksz) may or may not be
+	#         zeroed by the filesystem at this point.
+	$XFS_IO_PROG -c "truncate $eof_offset" -c fsync \
+		"$file" >> $seqres.full 2>&1
+
+	# Step 4: Unmount and restore the physical block to all-0x5a on disk.
+	#         This bypasses the kernel's pagecache EOF-zeroing to ensure
+	#         the stale pattern is present on disk. Then remount.
+	_scratch_unmount
+	$XFS_IO_PROG -d -c "pwrite -S 0x5a $phys_offset $blksz" \
+		$SCRATCH_DEV >> $seqres.full 2>&1
+	_scratch_mount >> $seqres.full 2>&1
+
+	# Step 5: Execute the extend operation.
+	$XFS_IO_PROG -c "$extend_cmd" "$file" >> $seqres.full 2>&1
+
+	# Step 6: Verify gap [eof_offset, blksz) is zeroed BEFORE shutdown
+	_check_gap_zero "$file" $eof_offset $gap_bytes "before shutdown" || return 1
+
+	# Step 7: Sync the extended range and shutdown the filesystem with
+	#         journal flush. This persists the file size extending, and
+	#         the filesystem should persist the zeroed data in the gap
+	#         range as well.
+	if [ "$extend_cmd" != "${extend_cmd#pwrite}" ]; then
+		$XFS_IO_PROG -c "sync_range -w $blksz $blksz" \
+			"$file" >> $seqres.full 2>&1
+	fi
+	_scratch_shutdown -f
+
+	# Step 8: Remount and verify gap is still zeroed
+	_scratch_cycle_mount
+
+	# Verify file size was not rolled back after shutdown+remount
+	local sz
+	sz=$(stat -c %s "$file")
+	if [ "$sz" -ne "$expected_new_sz" ]; then
+		_fail "$test_name: file size rolled back after shutdown+remount: $sz != $expected_new_sz"
+	fi
+
+	_check_gap_zero "$file" $eof_offset $gap_bytes "after shutdown+remount" || return 1
+}
+
+_scratch_mkfs >> $seqres.full 2>&1
+_scratch_mount
+
+blksz=$(_get_block_size $SCRATCH_MNT)
+
+# Test three variants of EOF-extending operations
+_test_eof_zeroing "append_write" "pwrite -S 0x42 $blksz $blksz" $((blksz * 2))
+_test_eof_zeroing "truncate_up" "truncate $((blksz * 2))" $((blksz * 2))
+_test_eof_zeroing "fallocate" "falloc $blksz $blksz" $((blksz * 2))
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/790.out b/tests/generic/790.out
new file mode 100644
index 00000000..e5e2cc09
--- /dev/null
+++ b/tests/generic/790.out
@@ -0,0 +1,4 @@
+QA output created by 790
+append_write
+truncate_up
+fallocate
-- 
2.52.0


^ permalink raw reply related

* Re: [PATCH v3 v3 2/2] ext4: allow clearing mballoc stats through mb_stats
From: Baokun Li @ 2026-04-24  9:34 UTC (permalink / raw)
  To: liubaolin, Theodore Tso
  Cc: adilger.kernel, ojaswin, ritesh.list, yi.zhang, linux-ext4,
	linux-kernel, wangguanyu, Baolin Liu, Andreas Dilger
In-Reply-To: <592456a9-ce45-4967-a7c4-4ed80e908bac@163.com>


On 2026/4/24 16:09, liubaolin wrote:
>
>
> 在 2026/4/24 0:19, Theodore Tso 写道:
>> On Wed, Apr 22, 2026 at 09:50:25AM +0800, Baolin Liu wrote:
>>> From: Baolin Liu <liubaolin@kylinos.cn>
>>>
>>> Make /proc/fs/ext4/<dev>/mb_stats writable and clear the runtime
>>> mballoc statistics when 0 is written.
>>
>> At the moment to enable mb_stats the system administrator needs to
>> write "1" to /sys/fs/ext4/<dev>/mb_stats, and writing "0" to the sysfs
>> file will pauce the statistics colleciton (but not clear the
>> statistics).  Adding a way to clear the statistics by writing to the
>> procfs file might be confusing to users.
>>
>> So.... as a suggestion, if you're adding to the ability to write to
>> /proc/fs/.../mb_stats, what if we make things work by
>>
>>     * Write 1 to /proc/fs/.../mb_stats to  enable statistics collection
>>     * Write 0 to /proc/fs/.../mb_stats to  disable statistics collection
>>     * Write -1 to /proc/fs/.../mb_stats to clear statistics counters
>>
>> And then deprecate the /sys/fs/.../mb_stats variable (but we probably
>> won't be able to remove it for at least a year or two).
>>
>>                                         - Ted
> Dear Ted, Baokun,
>    Thank you for your review and suggestions.
>    Since you mentioned that /sys/fs/.../mb_stats cannot be deleted in
> the short term,
>    I plan to modify and submit a v4 patch according to the following
> strategy.
>
>    1. Change `/proc/fs/.../mb_stats` to read-write mode.
>     * Read `/proc/fs/.../mb_stats` to show statistics counters.
>     * Write 0 to `/proc/fs/.../mb_stats` to disable statistics
> collection.
>     * Write 1 to `/proc/fs/.../mb_stats` to enable statistics collection.
>     * Write 2 to `/proc/fs/.../mb_stats` to clear statistics counters.
>
>    2. Do not delete the `/sys/fs/.../mb_stats` node for now; implement
> the same write control logic.
>     * Write 0 to `/sys/fs/.../mb_stats` to disable statistics collection.
>     * Write 1 to `/sys/fs/.../mb_stats` to enable statistics collection.
>     * Write 2 to `/sys/fs/.../mb_stats` to clear statistics counters.
>
>     Delete `/sys/fs/.../mb_stats` later when it is possible to delete it.
>
>    3. Modify the relevant documentation for `mb_stats`.
>     Documentation/ABI/testing/sysfs-fs-ext4
>     Documentation/admin-guide/ext4.rst
>     Documentation/filesystems/proc.rst
>
>    Compared to your suggestion, I recommend using the value 2 for the
> clear operation because s_mb_stats is an unsigned int variable, and
> using -1 requires changing the variable type.
>    I suggest avoiding changing the s_mb_stats variable type unless
> absolutely necessary.
>
>    Do you think this modification is appropriate?
>    If there are no problems, I will start modifying the code and
> submit the v4 patch as soon as possible.

For the clear command, we only handle it without storing it, so s_mb_stats
remains unchanged and still stores only 0 and non-zero values to represent
disabled and enabled, respectively. Otherwise, you will have to deal with
a large number of s_mb_stats checks

That means the /sys/fs/.../mb_stats interface does not need to support
clearing, but it might make sense to add a deprecation warning there.

Then in `/proc/fs/.../mb_stats`, writing 0 or a positive number passes
it to s_mb_stats, writing -1 performs a reset, and other negative values
return -EINVAL.


Cheers,
Baokun



^ permalink raw reply

* [PATCH] ext4: fix LOGFLUSH shutdown ordering to allow ordered-mode data writeback
From: Zhang Yi @ 2026-04-24 10:42 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, libaokun,
	jack, ojaswin, ritesh.list, yi.zhang, yi.zhang, yizhang089,
	yangerkun, yukuai

From: Zhang Yi <yi.zhang@huawei.com>

In EXT4_GOING_FLAGS_LOGFLUSH mode, the EXT4_FLAGS_SHUTDOWN flag was set
before calling ext4_force_commit().  This caused ordered-mode data
writeback (triggered by journal commit) to fail with -EIO, since
ext4_do_writepages() checks for the shutdown flag.  The journal would
then be aborted prematurely before the commit could succeed.

Fix this by calling ext4_force_commit() first, then setting the
shutdown flag, so that pending data can be written back correctly.

Note that moving ext4_force_commit() before setting the shutdown flag
creates a small window in which new writes may occur and generate new
journal transactions.  When the journal is subsequently aborted, the
new transactions will not be able to write to disk.  This is intentional
because LOGFLUSH's semantics are to flush pre-existing journal entries
before shutdown, not to guarantee atomicity for writes that race with
the ioctl.

Fixes: 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
This fix addresses my new generic/970 test, which fails during the file
size verification after shutdown and remount.

 https://lore.kernel.org/fstests/20260424092228.1396658-1-yi.zhang@huaweicloud.com/

 fs/ext4/ioctl.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 1d0c3d4bdf47..110e3fb194ec 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -830,11 +830,17 @@ int ext4_force_shutdown(struct super_block *sb, u32 flags)
 		bdev_thaw(sb->s_bdev);
 		break;
 	case EXT4_GOING_FLAGS_LOGFLUSH:
+		/*
+		 * Call ext4_force_commit() before setting EXT4_FLAGS_SHUTDOWN.
+		 * This is because in data=ordered mode, journal commit
+		 * triggers data writeback which fails if shutdown is already
+		 * set, causing the journal to be aborted prematurely before
+		 * the commit succeeds.
+		 */
+		(void) ext4_force_commit(sb);
 		set_bit(EXT4_FLAGS_SHUTDOWN, &sbi->s_ext4_flags);
-		if (sbi->s_journal && !is_journal_aborted(sbi->s_journal)) {
-			(void) ext4_force_commit(sb);
+		if (sbi->s_journal && !is_journal_aborted(sbi->s_journal))
 			jbd2_journal_abort(sbi->s_journal, -ESHUTDOWN);
-		}
 		break;
 	case EXT4_GOING_FLAGS_NOLOGFLUSH:
 		set_bit(EXT4_FLAGS_SHUTDOWN, &sbi->s_ext4_flags);
-- 
2.52.0

^ permalink raw reply related

* Re: [PATCH v3 v3 2/2] ext4: allow clearing mballoc stats through mb_stats
From: Theodore Tso @ 2026-04-24 12:07 UTC (permalink / raw)
  To: liubaolin
  Cc: libaokun, adilger.kernel, ojaswin, ritesh.list, yi.zhang,
	linux-ext4, linux-kernel, wangguanyu, Baolin Liu, Andreas Dilger
In-Reply-To: <592456a9-ce45-4967-a7c4-4ed80e908bac@163.com>

On Fri, Apr 24, 2026 at 04:09:31PM +0800, liubaolin wrote:
> 
>    2. Do not delete the `/sys/fs/.../mb_stats` node for now; implement the
> same write control logic.
> 	* Write 0 to `/sys/fs/.../mb_stats` to disable statistics collection.
> 	* Write 1 to `/sys/fs/.../mb_stats` to enable statistics collection.
> 	* Write 2 to `/sys/fs/.../mb_stats` to clear statistics counters.

We could do that, but note that currently writing to
/sys/fs/.../mb_stats just sets an unsigned integer in
EXT4(sb)->s_mb_stats.  There is no ext4-specific function that runs
when /sys/fs/.../mb_stats is updated.

So either you have to add some check in fs/ext4/mballoc.c which gets
called every single time a block allocation happens --- and consider
the race condition where two CPU's are checking s_mb_stats at the same
time, and the desireability of adding a spinlock that would need to be
taken every single time a block allocation happens ---- or you have
add an ext4-specific function in fs/ext4/sysfs.c.

>    Compared to your suggestion, I recommend using the value 2 for the clear
> operation because s_mb_stats is an unsigned int variable, and using -1
> requires changing the variable type.

Well, since you have introduced an ext4-specific function which gets
called when writing to the procfs file, that function can clear the
statistics counter when -1 is written to the file --- and then set
s_mbi_stats to 1.

Cheers,

						- Ted

^ permalink raw reply

* Re: [PATCH v2] generic/790: test post-EOF gap zeroing persistence
From: Brian Foster @ 2026-04-24 13:09 UTC (permalink / raw)
  To: Zhang Yi
  Cc: fstests, zlang, linux-ext4, linux-fsdevel, jack, yi.zhang,
	yizhang089, yangerkun
In-Reply-To: <20260424092228.1396658-1-yi.zhang@huaweicloud.com>

On Fri, Apr 24, 2026 at 05:22:28PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Test that extending a file past a non-block-aligned EOF correctly
> zero-fills the gap [old_EOF, block_boundary), and that this zeroing
> persists through a filesystem shutdown+remount cycle.
> 
> Stale data beyond EOF can persist on disk when append write data blocks
> are flushed before the on-disk file size update, or when concurrent
> append writeback and mmap writes persist non-zero data past EOF.
> Subsequent post-EOF operations (append write, fallocate, truncate up)
> must zero-fill and persist the gap to prevent exposing stale data.
> 
> The test pollutes the file's last physical block (via FIEMAP + raw
> device write) with a sentinel pattern beyond i_size, then performs each
> extend operation and verifies the gap is zeroed both in memory and on
> disk.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
> v1->v2:
>  - Add _require_no_realtime to prevent testing on XFS realtime devices,
>    where file data may reside on $SCRATCH_RTDEV.
>  - Add _exclude_fs btrfs since FIEMAP returns logical addresses, not
>    physical device offsets, writing to these offsets on $SCRATCH_DEV
>    would corrupt the filesystem in multi-device setups. Besides, since
>    btrfs doesn't support shutdown right now, we can support it later.
>  - Add -v flag to od in _check_gap_zero() to prevent line folding of
>    identical consecutive lines.
>  - Add expected_new_sz parameter to _test_eof_zeroing(), verify file
>    size was not rolled back after shutdown+remount cycle, and also drop
>    the unnecessary file size check before the shutdown as well.
>  - Clarify the comment regarding when stale data beyond EOF can persist.
> 

Thanks for the tweaks. This all LGTM from a review standpoint. I gave it
a quick test on latest master and I see a few failures in a couple runs:

- On XFS (mkfs defaults) I saw one unexpected i_size failure and one
  zeroing failure, both on write extension fwiw.
- On ext4 I saw a few unexpected i_size failures (both with mkfs
  defaults and 1k block size).

I haven't dug into anything beyond that. Does this match what you're
seeing on current kernels or are these unexpected failures?

Brian

>  tests/generic/790     | 164 ++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/790.out |   4 ++
>  2 files changed, 168 insertions(+)
>  create mode 100755 tests/generic/790
>  create mode 100644 tests/generic/790.out
> 
> diff --git a/tests/generic/790 b/tests/generic/790
> new file mode 100755
> index 00000000..2adc06f8
> --- /dev/null
> +++ b/tests/generic/790
> @@ -0,0 +1,164 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2026 Huawei.  All Rights Reserved.
> +#
> +# FS QA Test No. 790
> +#
> +# Test that extending a file past a non-block-aligned EOF correctly zero-fills
> +# the gap [old_EOF, block_boundary), and that this zeroing persists through a
> +# filesystem shutdown+remount cycle.
> +#
> +# Stale data beyond EOF can persist on disk when:
> +# 1) append write data blocks are flushed before the on-disk file size update,
> +#    and the system crashes in this window.
> +# 2) concurrent append writeback and mmap writes persist non-zero data past EOF.
> +#
> +# Subsequent post-EOF operations (append write, fallocate, truncate up) must
> +# zero-fill and persist the gap to prevent exposing stale data.
> +#
> +# The test pollutes the file's last physical block (via FIEMAP + raw device
> +# write) with a sentinel pattern beyond i_size, then performs each extend
> +# operation and verifies the gap is zeroed both in memory and on disk.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick rw shutdown
> +
> +. ./common/filter
> +
> +_require_scratch
> +_require_block_device $SCRATCH_DEV
> +_require_no_realtime
> +_require_scratch_shutdown
> +_require_metadata_journaling $SCRATCH_DEV
> +
> +# FIEMAP on Btrfs returns logical addresses within the filesystem's address
> +# space, not physical device offsets. Writing to these offsets on $SCRATCH_DEV
> +# would corrupt the filesystem in multi-device setups.
> +_exclude_fs btrfs
> +
> +_require_xfs_io_command "fiemap"
> +_require_xfs_io_command "falloc"
> +_require_xfs_io_command "pwrite"
> +_require_xfs_io_command "truncate"
> +_require_xfs_io_command "sync_range"
> +
> +# Check that gap region [offset, offset+nbytes) is entirely zero
> +_check_gap_zero()
> +{
> +	local file="$1"
> +	local offset="$2"
> +	local nbytes="$3"
> +	local label="$4"
> +	local data
> +	local stripped
> +
> +	data=$(od -A n -t x1 -v -j $offset -N $nbytes "$file" 2>/dev/null)
> +
> +	# Remove whitespace and check if any byte is non-zero
> +	stripped=$(printf '%s' "$data" | tr -d ' \n\t')
> +	if [ -n "$stripped" ] && ! echo "$stripped" | grep -qE "^0+$"; then
> +		echo "FAIL: non-zero data in gap [$offset,$((offset + nbytes))) $label"
> +		_hexdump -N $((offset + nbytes)) "$file"
> +		return 1
> +	fi
> +	return 0
> +}
> +
> +# Get the physical block offset (in bytes) of the file's first block on device
> +_get_phys_offset()
> +{
> +	local file="$1"
> +	local fiemap_output
> +	local phys_blk
> +
> +	fiemap_output=$($XFS_IO_PROG -r -c "fiemap -v" "$file" 2>/dev/null)
> +	phys_blk=$(echo "$fiemap_output" | _filter_xfs_io_fiemap | head -1 | awk '{print $3}')
> +	if [ -z "$phys_blk" ]; then
> +		echo ""
> +		return
> +	fi
> +	# Convert 512-byte blocks to bytes
> +	echo $((phys_blk * 512))
> +}
> +
> +_test_eof_zeroing()
> +{
> +	local test_name="$1"
> +	local extend_cmd="$2"
> +	local expected_new_sz="$3"
> +	local file=$SCRATCH_MNT/testfile_${test_name}
> +
> +	echo "$test_name" | tee -a $seqres.full
> +
> +	# Compute non-block-aligned EOF offset
> +	local gap_bytes=16
> +	local eof_offset=$((blksz - gap_bytes))
> +
> +	# Step 1: Write one full block to ensure the filesystem allocates a
> +	#         physical block for the file instead of using inline data.
> +	$XFS_IO_PROG -f -c "pwrite -S 0x5a 0 $blksz" -c fsync \
> +		"$file" >> $seqres.full 2>&1
> +
> +	# Step 2: Get physical block offset on device via FIEMAP
> +	local phys_offset
> +	phys_offset=$(_get_phys_offset "$file")
> +	if [ -z "$phys_offset" ]; then
> +		_fail "$test_name: failed to get physical block offset via fiemap"
> +	fi
> +
> +	# Step 3: Truncate file to non-block-aligned size and fsync.
> +	#         The on-disk region [eof_offset, blksz) may or may not be
> +	#         zeroed by the filesystem at this point.
> +	$XFS_IO_PROG -c "truncate $eof_offset" -c fsync \
> +		"$file" >> $seqres.full 2>&1
> +
> +	# Step 4: Unmount and restore the physical block to all-0x5a on disk.
> +	#         This bypasses the kernel's pagecache EOF-zeroing to ensure
> +	#         the stale pattern is present on disk. Then remount.
> +	_scratch_unmount
> +	$XFS_IO_PROG -d -c "pwrite -S 0x5a $phys_offset $blksz" \
> +		$SCRATCH_DEV >> $seqres.full 2>&1
> +	_scratch_mount >> $seqres.full 2>&1
> +
> +	# Step 5: Execute the extend operation.
> +	$XFS_IO_PROG -c "$extend_cmd" "$file" >> $seqres.full 2>&1
> +
> +	# Step 6: Verify gap [eof_offset, blksz) is zeroed BEFORE shutdown
> +	_check_gap_zero "$file" $eof_offset $gap_bytes "before shutdown" || return 1
> +
> +	# Step 7: Sync the extended range and shutdown the filesystem with
> +	#         journal flush. This persists the file size extending, and
> +	#         the filesystem should persist the zeroed data in the gap
> +	#         range as well.
> +	if [ "$extend_cmd" != "${extend_cmd#pwrite}" ]; then
> +		$XFS_IO_PROG -c "sync_range -w $blksz $blksz" \
> +			"$file" >> $seqres.full 2>&1
> +	fi
> +	_scratch_shutdown -f
> +
> +	# Step 8: Remount and verify gap is still zeroed
> +	_scratch_cycle_mount
> +
> +	# Verify file size was not rolled back after shutdown+remount
> +	local sz
> +	sz=$(stat -c %s "$file")
> +	if [ "$sz" -ne "$expected_new_sz" ]; then
> +		_fail "$test_name: file size rolled back after shutdown+remount: $sz != $expected_new_sz"
> +	fi
> +
> +	_check_gap_zero "$file" $eof_offset $gap_bytes "after shutdown+remount" || return 1
> +}
> +
> +_scratch_mkfs >> $seqres.full 2>&1
> +_scratch_mount
> +
> +blksz=$(_get_block_size $SCRATCH_MNT)
> +
> +# Test three variants of EOF-extending operations
> +_test_eof_zeroing "append_write" "pwrite -S 0x42 $blksz $blksz" $((blksz * 2))
> +_test_eof_zeroing "truncate_up" "truncate $((blksz * 2))" $((blksz * 2))
> +_test_eof_zeroing "fallocate" "falloc $blksz $blksz" $((blksz * 2))
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/790.out b/tests/generic/790.out
> new file mode 100644
> index 00000000..e5e2cc09
> --- /dev/null
> +++ b/tests/generic/790.out
> @@ -0,0 +1,4 @@
> +QA output created by 790
> +append_write
> +truncate_up
> +fallocate
> -- 
> 2.52.0
> 


^ permalink raw reply

* [PATCH] ext4: Use %pe to print PTR_ERR() in namei.c
From: Abdellah Ouhbi @ 2026-04-24 15:22 UTC (permalink / raw)
  To: tytso
  Cc: linux-ext4, linux-kernel, skhan, me, linux-kernel-mentees,
	Abdellah Ouhbi

Fix coccicheck warning
./namei.c:150:25-32: WARNING: Consider using %pe to print PTR_ERR()

Replace %ld with %pe and PTR_ERR(bh) with bh pointer.
The %pe specifier automatically converts error pointers to
human-readable error names instead of raw error codes.

Signed-off-by: Abdellah Ouhbi <abdououhbi1@gmail.com>
---
 fs/ext4/namei.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 4a47fbd8dd30..c0cabf172020 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -145,9 +145,9 @@ static struct buffer_head *__ext4_read_dirblock(struct inode *inode,
 	if (IS_ERR(bh)) {
 		__ext4_warning(inode->i_sb, func, line,
 			       "inode #%llu: lblock %lu: comm %s: "
-			       "error %ld reading directory block",
+			       "error %pe reading directory block",
 			       inode->i_ino, (unsigned long)block,
-			       current->comm, PTR_ERR(bh));
+			       current->comm, bh);
 
 		return bh;
 	}
-- 
2.51.0


^ permalink raw reply related

* [PATCH] ext4: Use %pe to print PTR_ERR() in extents.c
From: Abdellah Ouhbi @ 2026-04-24 15:43 UTC (permalink / raw)
  To: tytso
  Cc: linux-ext4, linux-kernel, skhan, me, linux-kernel-mentees,
	Abdellah Ouhbi

Fix coccicheck warning
./extents.c:3272:12-19: WARNING: Consider using %pe to print PTR_ERR()

Replace %ld with %pe and PTR_ERR(path) with path pointer.
The %pe specifier automatically converts error pointers to
human-readable error names instead of raw error codes.

Signed-off-by: Abdellah Ouhbi <abdououhbi1@gmail.com>
---
 fs/ext4/extents.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 125f628e738a..91c97af64b31 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3268,8 +3268,8 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
 	 */
 	path = ext4_find_extent(inode, ee_block, NULL, flags | EXT4_EX_NOFAIL);
 	if (IS_ERR(path)) {
-		EXT4_ERROR_INODE(inode, "Failed split extent on %u, err %ld",
-				 split, PTR_ERR(path));
+		EXT4_ERROR_INODE(inode, "Failed split extent on %u, err %pe",
+				 split, path);
 		goto out_path;
 	}
 
-- 
2.51.0


^ permalink raw reply related

* [PATCH] ext4: Use %pe to print PTR_ERR() in super.c
From: Abdellah Ouhbi @ 2026-04-24 15:55 UTC (permalink / raw)
  To: tytso
  Cc: linux-ext4, linux-kernel, skhan, me, linux-kernel-mentees,
	Abdellah Ouhbi

Fix coccicheck warning
./super.c:5981:32-39: WARNING: Consider using %pe to print PTR_ERR()

Replace %ld with %pe and PTR_ERR(bdev_file) with bdev_file pointer.
The %pe specifier automatically converts error pointers to
human-readable error names instead of raw error codes.

Signed-off-by: Abdellah Ouhbi <abdououhbi1@gmail.com>
---
 fs/ext4/super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6a77db4d3124..4b69e0879731 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5977,8 +5977,8 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
 		sb, &fs_holder_ops);
 	if (IS_ERR(bdev_file)) {
 		ext4_msg(sb, KERN_ERR,
-			 "failed to open journal device unknown-block(%u,%u) %ld",
-			 MAJOR(j_dev), MINOR(j_dev), PTR_ERR(bdev_file));
+			 "failed to open journal device unknown-block(%u,%u) %pe",
+			 MAJOR(j_dev), MINOR(j_dev), bdev_file);
 		return bdev_file;
 	}
 
-- 
2.51.0


^ permalink raw reply related

* [PATCH v11 00/15] Exposing case folding behavior
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Darrick J. Wong, Roland Mainz, Steve French

Following on from:

https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa

I'm attempting to implement enough support in the Linux VFS to
enable file services like NFSD and ksmbd (and user space
equivalents) to provide the actual status of case folding support
in local file systems. The default behavior for local file systems
not explicitly supported in this series is to reflect the usual
POSIX behaviors:

  case-insensitive = false
  case-nonpreserving = false

The case-insensitivity and case-nonpreserving booleans can be
consumed immediately by NFSD. These two attributes have been part of
the NFSv3 and NFSv4 protocols for decades, in order to support NFS
client implementations on non-POSIX systems.

Support for user space file servers is why this series exposes case
folding information via a user-space API. I don't know of any other
category of user-space application that requires access to case
folding info.

The Linux NFS community has a growing interest in supporting NFS
clients on Windows and MacOS platforms, where file name behavior does
not align with traditional POSIX semantics.

One example of a Windows-based NFS client is [1]. This client
implementation explicitly requires servers to report
FATTR4_WORD0_CASE_INSENSITIVE = TRUE for proper operation, a hard
requirement for Windows client interoperability because Windows
applications expect case-insensitive behavior. When an NFS client
knows the server is case-insensitive, it can avoid issuing multiple
LOOKUP/READDIR requests to search for case variants, and applications
like Win32 programs work correctly without manual workarounds or
code changes.

Even the Linux client can take advantage of this information. Trond
merged patches 4 years ago [2] that introduce support for case
insensitivity, in support of the Hammerspace NFS server. In
particular, when a client detects a case-insensitive NFS share,
negative dentry caching must be disabled (a lookup for "FILE.TXT"
failing shouldn't cache a negative entry when "file.txt" exists)
and directory change invalidation must clear all cached case-folded
file name variants.

Hammerspace servers and several other NFS server implementations
operate in multi-protocol environments, where a single file service
instance caters to both NFS and SMB clients. In those cases, things
work more smoothly for everyone when the NFS client can see and adapt
to the case folding behavior that SMB users rely on and expect. NFSD
needs to support the case-insensitivity and case-nonpreserving
booleans properly in order to participate as a first-class citizen
in such environments.

[1] https://github.com/kofemann/ms-nfs41-client

[2] https://patchwork.kernel.org/project/linux-nfs/cover/20211217203658.439352-1-trondmy@kernel.org/

---
Changes since v10:
- cifs: Source case-handling flags from the server's cached
  FS_ATTRIBUTE_INFORMATION reply instead of the nocase mount
  option, with a nocase fallback when the reply is absent
- Address findings from sashiko(gemini-3) and gpt-5.5:
  - nfs: Skip pathconf case bits on NFSv4 (set via FATTR4_CASE_*
    instead)
  - xfs: Hide FS_CASEFOLD_FL from the legacy flags view so
    chattr round-trips do not hit the setflags whitelist
  - ext4, f2fs: Drop redundant fileattr_get patches; the
    FS_CASEFOLD_FL translation in fileattr_fill_flags() already
    reports FS_XFLAG_CASEFOLD for casefolded directories
  - nfsd: Report FATTR4_HOMOGENEOUS = FALSE when the exported
    filesystem has a Unicode encoding, since per-directory
    casefold makes the fs-scoped case attributes inhomogeneous
  - nfsd: Document in nfsd_get_case_info() why -ENOIOCTLCMD and
    -ENOTTY are swallowed while other errors propagate
  - fat: Honor vfat 'check=strict' when reporting FS_XFLAG_CASEFOLD
  - Set FS_CASEFOLD_FL so FS_IOC_GETFLAGS reflects case-insensitive
    mount
  - isofs: Register fileattr_get on regular file and symlink inodes,
    not just directories
  - nfsd: Query NFSv4 FATTR4_CASE_* from the parent directory for
    non-directory objects, since casefold lives on the directory

Changes since v9:
- nfs: always probe PATHCONF for case caps. Default to case-
  preserving when the server does not report case_preserving
- nfsd, ksmbd: tolerate -ENOTTY from vfs_fileattr_get() so
  overlayfs exports on backing filesystems without fileattr_get
  do not fail the RPC
- xfs: map FS_XFLAG_CASEFOLD inside xfs_ip2xflags() so BULKSTAT
  and FS_IOC_FSGETXATTR report the flag consistently
- vboxsf: reject a short host reply to SHFL_INFO_VOLUME before
  trusting volinfo.properties.case_sensitive

Changes since v8:
- Rebase on v7.0-rc1

Changes since v7:
- Split file_attr initialization changes into a separate patch

Changes since v6:
- Remove the memset from vfs_fileattr_get

Changes since v5:
- Finish the conversion to FS_XFLAGs
- NFSv4 GETATTR now clears the attr mask bit if nfsd_get_case_info()
  fails

Changes since v4:
- Observe the MSDOS "nocase" mount option
- Define new FS_XFLAGs for the user API

Changes since v3:
- Change fa->case_preserving to fa_case_nonpreserving
- VFAT is case preserving
- Make new fields available to user space

Changes since v2:
- Remove unicode labels
- Replace vfs_get_case_info
- Add support for several more local file system implementations
- Add support for in-kernel SMB server

Changes since RFC:
- Use file_getattr instead of statx
- Postpone exposing Unicode version until later
- Support NTFS and ext4 in addition to FAT
- Support NFSv4 fattr4 in addition to NFSv3 PATHCONF

---
Changes in v11:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v10: https://patch.msgid.link/20260423-case-sensitivity-v10-0-c385d674a6cf@oracle.com

---
Chuck Lever (15):
      fs: Move file_kattr initialization to callers
      fs: Add case sensitivity flags to file_kattr
      fat: Implement fileattr_get for case sensitivity
      exfat: Implement fileattr_get for case sensitivity
      ntfs3: Implement fileattr_get for case sensitivity
      hfs: Implement fileattr_get for case sensitivity
      hfsplus: Report case sensitivity in fileattr_get
      xfs: Report case sensitivity in fileattr_get
      cifs: Implement fileattr_get for case sensitivity
      nfs: Implement fileattr_get for case sensitivity
      vboxsf: Implement fileattr_get for case sensitivity
      isofs: Implement fileattr_get for case sensitivity
      nfsd: Report export case-folding via NFSv3 PATHCONF
      nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
      ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION

 fs/exfat/exfat_fs.h            |  2 ++
 fs/exfat/file.c                | 18 ++++++++++++--
 fs/exfat/namei.c               |  1 +
 fs/fat/fat.h                   |  3 +++
 fs/fat/file.c                  | 32 ++++++++++++++++++++++++
 fs/fat/namei_msdos.c           |  1 +
 fs/fat/namei_vfat.c            |  1 +
 fs/file_attr.c                 | 16 ++++++------
 fs/hfs/dir.c                   |  1 +
 fs/hfs/hfs_fs.h                |  2 ++
 fs/hfs/inode.c                 | 14 +++++++++++
 fs/hfsplus/inode.c             | 12 +++++++++
 fs/isofs/dir.c                 | 24 ++++++++++++++++++
 fs/isofs/inode.c               |  3 ++-
 fs/isofs/isofs.h               |  5 ++++
 fs/nfs/client.c                | 22 +++++++++++++----
 fs/nfs/inode.c                 | 23 ++++++++++++++++++
 fs/nfs/internal.h              |  3 +++
 fs/nfs/nfs3proc.c              |  2 ++
 fs/nfs/nfs3xdr.c               |  7 ++++--
 fs/nfs/nfs4proc.c              |  7 ++++--
 fs/nfs/proc.c                  |  3 +++
 fs/nfs/symlink.c               |  3 +++
 fs/nfsd/nfs3proc.c             | 18 ++++++++------
 fs/nfsd/nfs4xdr.c              | 55 +++++++++++++++++++++++++++++++++++++++---
 fs/nfsd/vfs.c                  | 43 +++++++++++++++++++++++++++++++++
 fs/nfsd/vfs.h                  |  3 +++
 fs/ntfs3/file.c                | 25 +++++++++++++++++++
 fs/ntfs3/inode.c               |  1 +
 fs/ntfs3/namei.c               |  2 ++
 fs/ntfs3/ntfs_fs.h             |  1 +
 fs/smb/client/cifsfs.c         | 42 ++++++++++++++++++++++++++++++++
 fs/smb/server/smb2pdu.c        | 30 ++++++++++++++++++-----
 fs/vboxsf/dir.c                |  1 +
 fs/vboxsf/file.c               |  6 +++--
 fs/vboxsf/super.c              |  7 ++++++
 fs/vboxsf/utils.c              | 30 +++++++++++++++++++++++
 fs/vboxsf/vfsmod.h             |  6 +++++
 fs/xfs/libxfs/xfs_inode_util.c |  2 ++
 fs/xfs/xfs_ioctl.c             |  9 ++++++-
 include/linux/fileattr.h       |  3 ++-
 include/linux/nfs_fs_sb.h      |  2 +-
 include/linux/nfs_xdr.h        |  2 ++
 include/uapi/linux/fs.h        |  7 ++++++
 44 files changed, 458 insertions(+), 42 deletions(-)
---
base-commit: 6596a02b207886e9e00bb0161c7fd59fea53c081
change-id: 20260422-case-sensitivity-5cbffc8f1558

Best regards,
--  
Chuck Lever


^ permalink raw reply

* [PATCH v11 01/15] fs: Move file_kattr initialization to callers
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Darrick J. Wong, Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

fileattr_fill_xflags() and fileattr_fill_flags() memset the
entire file_kattr struct before populating select fields, so
callers cannot pre-set fields in fa->fsx_xflags without having
their values clobbered. Darrick Wong noted that a function
named "fill_xflags" touching more than xflags forces callers
to know implementation details beyond its apparent scope.

Drop the memset from both fill functions and initialize at the
entry points instead: ioctl_setflags(), ioctl_fssetxattr(),
the file_setattr() syscall, and xfs_ioc_fsgetxattra() now
declare fa with an aggregate initializer. ioctl_getflags(),
ioctl_fsgetxattr(), and the file_getattr() syscall already
aggregate-initialize fa to pass flags_valid/fsx_valid hints
into vfs_fileattr_get().

Subsequent patches rely on this so that ->fileattr_get()
handlers can set case-sensitivity flags (FS_XFLAG_CASEFOLD,
FS_XFLAG_CASENONPRESERVING) in fa->fsx_xflags before the fill
functions run.

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/file_attr.c     | 12 ++++--------
 fs/xfs/xfs_ioctl.c |  2 +-
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/file_attr.c b/fs/file_attr.c
index da983e105d70..f429da66a317 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -15,12 +15,10 @@
  * @fa:		fileattr pointer
  * @xflags:	FS_XFLAG_* flags
  *
- * Set ->fsx_xflags, ->fsx_valid and ->flags (translated xflags).  All
- * other fields are zeroed.
+ * Set ->fsx_xflags, ->fsx_valid and ->flags (translated xflags).
  */
 void fileattr_fill_xflags(struct file_kattr *fa, u32 xflags)
 {
-	memset(fa, 0, sizeof(*fa));
 	fa->fsx_valid = true;
 	fa->fsx_xflags = xflags;
 	if (fa->fsx_xflags & FS_XFLAG_IMMUTABLE)
@@ -48,11 +46,9 @@ EXPORT_SYMBOL(fileattr_fill_xflags);
  * @flags:	FS_*_FL flags
  *
  * Set ->flags, ->flags_valid and ->fsx_xflags (translated flags).
- * All other fields are zeroed.
  */
 void fileattr_fill_flags(struct file_kattr *fa, u32 flags)
 {
-	memset(fa, 0, sizeof(*fa));
 	fa->flags_valid = true;
 	fa->flags = flags;
 	if (fa->flags & FS_SYNC_FL)
@@ -325,7 +321,7 @@ int ioctl_setflags(struct file *file, unsigned int __user *argp)
 {
 	struct mnt_idmap *idmap = file_mnt_idmap(file);
 	struct dentry *dentry = file->f_path.dentry;
-	struct file_kattr fa;
+	struct file_kattr fa = {};
 	unsigned int flags;
 	int err;
 
@@ -357,7 +353,7 @@ int ioctl_fssetxattr(struct file *file, void __user *argp)
 {
 	struct mnt_idmap *idmap = file_mnt_idmap(file);
 	struct dentry *dentry = file->f_path.dentry;
-	struct file_kattr fa;
+	struct file_kattr fa = {};
 	int err;
 
 	err = copy_fsxattr_from_user(&fa, argp);
@@ -431,7 +427,7 @@ SYSCALL_DEFINE5(file_setattr, int, dfd, const char __user *, filename,
 	struct path filepath __free(path_put) = {};
 	unsigned int lookup_flags = 0;
 	struct file_attr fattr;
-	struct file_kattr fa;
+	struct file_kattr fa = {};
 	int error;
 
 	BUILD_BUG_ON(sizeof(struct file_attr) < FILE_ATTR_SIZE_VER0);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 46e234863644..ed9b4846c05f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -517,7 +517,7 @@ xfs_ioc_fsgetxattra(
 	xfs_inode_t		*ip,
 	void			__user *arg)
 {
-	struct file_kattr	fa;
+	struct file_kattr	fa = {};
 
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	xfs_fill_fsxattr(ip, XFS_ATTR_FORK, &fa);

-- 
2.53.0


^ permalink raw reply related

* [PATCH v11 02/15] fs: Add case sensitivity flags to file_kattr
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Darrick J. Wong, Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Enable upper layers such as NFSD to retrieve case sensitivity
information from file systems by adding FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING flags.

Filesystems report case-insensitive or case-nonpreserving behavior
by setting these flags directly in fa->fsx_xflags. The default
(flags unset) indicates POSIX semantics: case-sensitive and
case-preserving. Both flags are added to FS_XFLAG_RDONLY_MASK so
FS_IOC_FSSETXATTR silently strips them, keeping the new xflags
strictly a reporting interface. Callers that want to toggle
casefolding continue to use FS_IOC_SETFLAGS with FS_CASEFOLD_FL,
the established UAPI on filesystems that support the operation
(ext4 and f2fs on empty directories).

Case sensitivity information is exported to userspace via the
fa_xflags field in the FS_IOC_FSGETXATTR ioctl and file_getattr()
system call.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/file_attr.c           | 4 ++++
 include/linux/fileattr.h | 3 ++-
 include/uapi/linux/fs.h  | 7 +++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/file_attr.c b/fs/file_attr.c
index f429da66a317..bfb00d256dd5 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -37,6 +37,8 @@ void fileattr_fill_xflags(struct file_kattr *fa, u32 xflags)
 		fa->flags |= FS_PROJINHERIT_FL;
 	if (fa->fsx_xflags & FS_XFLAG_VERITY)
 		fa->flags |= FS_VERITY_FL;
+	if (fa->fsx_xflags & FS_XFLAG_CASEFOLD)
+		fa->flags |= FS_CASEFOLD_FL;
 }
 EXPORT_SYMBOL(fileattr_fill_xflags);
 
@@ -67,6 +69,8 @@ void fileattr_fill_flags(struct file_kattr *fa, u32 flags)
 		fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
 	if (fa->flags & FS_VERITY_FL)
 		fa->fsx_xflags |= FS_XFLAG_VERITY;
+	if (fa->flags & FS_CASEFOLD_FL)
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
 }
 EXPORT_SYMBOL(fileattr_fill_flags);
 
diff --git a/include/linux/fileattr.h b/include/linux/fileattr.h
index 3780904a63a6..58044b598016 100644
--- a/include/linux/fileattr.h
+++ b/include/linux/fileattr.h
@@ -16,7 +16,8 @@
 
 /* Read-only inode flags */
 #define FS_XFLAG_RDONLY_MASK \
-	(FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | FS_XFLAG_VERITY)
+	(FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | FS_XFLAG_VERITY | \
+	 FS_XFLAG_CASEFOLD | FS_XFLAG_CASENONPRESERVING)
 
 /* Flags to indicate valid value of fsx_ fields */
 #define FS_XFLAG_VALUES_MASK \
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 13f71202845e..2ea4c81df08f 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -254,6 +254,13 @@ struct file_attr {
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
 #define FS_XFLAG_VERITY		0x00020000	/* fs-verity enabled */
+/*
+ * Case handling flags (read-only, cannot be set via ioctl).
+ * Default (neither set) indicates POSIX semantics: case-sensitive
+ * lookups and case-preserving storage.
+ */
+#define FS_XFLAG_CASEFOLD	0x00040000	/* case-insensitive lookups */
+#define FS_XFLAG_CASENONPRESERVING 0x00080000	/* case not preserved */
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 /* the read-only stuff doesn't really belong here, but any other place is

-- 
2.53.0


^ permalink raw reply related

* [PATCH v11 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Report FAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
and FS_XFLAG_CASENONPRESERVING flags. FAT filesystems are
case-insensitive by default.

MSDOS supports a 'nocase' mount option that enables case-sensitive
behavior; check this option when reporting case sensitivity.

VFAT long filename entries preserve case; without VFAT, only
uppercased 8.3 short names are stored. MSDOS with 'nocase' also
preserves case since the name-formatting code skips upcasing when
'nocase' is set. Check both options when reporting case preservation.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/fat/fat.h         |  3 +++
 fs/fat/file.c        | 32 ++++++++++++++++++++++++++++++++
 fs/fat/namei_msdos.c |  1 +
 fs/fat/namei_vfat.c  |  1 +
 4 files changed, 37 insertions(+)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 5a58f0bf8ce8..99ed9228a677 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -10,6 +10,8 @@
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
 
+struct file_kattr;
+
 /*
  * vfat shortname flags
  */
@@ -408,6 +410,7 @@ extern void fat_truncate_blocks(struct inode *inode, loff_t offset);
 extern int fat_getattr(struct mnt_idmap *idmap,
 		       const struct path *path, struct kstat *stat,
 		       u32 request_mask, unsigned int flags);
+int fat_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
 extern int fat_file_fsync(struct file *file, loff_t start, loff_t end,
 			  int datasync);
 
diff --git a/fs/fat/file.c b/fs/fat/file.c
index becccdd2e501..5f0178fc2ede 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -17,6 +17,7 @@
 #include <linux/fsnotify.h>
 #include <linux/security.h>
 #include <linux/falloc.h>
+#include <linux/fileattr.h>
 #include "fat.h"
 
 static long fat_fallocate(struct file *file, int mode,
@@ -398,6 +399,36 @@ void fat_truncate_blocks(struct inode *inode, loff_t offset)
 	fat_flush_inodes(inode->i_sb, inode, NULL);
 }
 
+int fat_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct msdos_sb_info *sbi = MSDOS_SB(dentry->d_sb);
+	bool case_sensitive;
+
+	/*
+	 * FAT filesystems are case-insensitive by default. VFAT
+	 * becomes case-sensitive when mounted with 'check=strict',
+	 * which installs vfat_dentry_ops. MSDOS has no such option;
+	 * its 'nocase' mount option selects case-sensitive matching.
+	 *
+	 * VFAT long filename entries preserve case. Without VFAT, only
+	 * uppercased 8.3 short names are stored. MSDOS with 'nocase'
+	 * also preserves case.
+	 */
+	if (sbi->options.isvfat)
+		case_sensitive = sbi->options.name_check == 's';
+	else
+		case_sensitive = sbi->options.nocase;
+
+	if (!case_sensitive) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+		if (!sbi->options.isvfat)
+			fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fat_fileattr_get);
+
 int fat_getattr(struct mnt_idmap *idmap, const struct path *path,
 		struct kstat *stat, u32 request_mask, unsigned int flags)
 {
@@ -575,5 +606,6 @@ EXPORT_SYMBOL_GPL(fat_setattr);
 const struct inode_operations fat_file_inode_operations = {
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
+	.fileattr_get	= fat_fileattr_get,
 	.update_time	= fat_update_time,
 };
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index 4cc65f330fb7..0fd2971ad4b1 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -644,6 +644,7 @@ static const struct inode_operations msdos_dir_inode_operations = {
 	.rename		= msdos_rename,
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
+	.fileattr_get	= fat_fileattr_get,
 	.update_time	= fat_update_time,
 };
 
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 918b3756674c..e909447873e3 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1185,6 +1185,7 @@ static const struct inode_operations vfat_dir_inode_operations = {
 	.rename		= vfat_rename2,
 	.setattr	= fat_setattr,
 	.getattr	= fat_getattr,
+	.fileattr_get	= fat_fileattr_get,
 	.update_time	= fat_update_time,
 };
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH v11 04/15] exfat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Report exFAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. exFAT is always case-insensitive (using an upcase table for
comparison) and always preserves case at rest.

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/exfat/exfat_fs.h |  2 ++
 fs/exfat/file.c     | 18 ++++++++++++++++--
 fs/exfat/namei.c    |  1 +
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 89ef5368277f..aff4dcd4e75a 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -496,6 +496,8 @@ int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
 		  struct kstat *stat, unsigned int request_mask,
 		  unsigned int query_flags);
+struct file_kattr;
+int exfat_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
 int exfat_file_fsync(struct file *file, loff_t start, loff_t end, int datasync);
 long exfat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
 long exfat_compat_ioctl(struct file *filp, unsigned int cmd,
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 354bdcfe4abc..91e5511945d1 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -14,6 +14,7 @@
 #include <linux/writeback.h>
 #include <linux/filelock.h>
 #include <linux/falloc.h>
+#include <linux/fileattr.h>
 
 #include "exfat_raw.h"
 #include "exfat_fs.h"
@@ -323,6 +324,18 @@ int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
 	return 0;
 }
 
+int exfat_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	/*
+	 * exFAT compares filenames through an upcase table, so lookup
+	 * is always case-insensitive. Long names are stored in UTF-16
+	 * with case intact; CASENONPRESERVING stays clear.
+	 */
+	fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+	fa->flags |= FS_CASEFOLD_FL;
+	return 0;
+}
+
 int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 		  struct iattr *attr)
 {
@@ -817,6 +830,7 @@ const struct file_operations exfat_file_operations = {
 };
 
 const struct inode_operations exfat_file_inode_operations = {
-	.setattr     = exfat_setattr,
-	.getattr     = exfat_getattr,
+	.setattr	= exfat_setattr,
+	.getattr	= exfat_getattr,
+	.fileattr_get	= exfat_fileattr_get,
 };
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 2c5636634b4a..94002e43db08 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -1311,4 +1311,5 @@ const struct inode_operations exfat_dir_inode_operations = {
 	.rename		= exfat_rename,
 	.setattr	= exfat_setattr,
 	.getattr	= exfat_getattr,
+	.fileattr_get	= exfat_fileattr_get,
 };

-- 
2.53.0


^ permalink raw reply related

* [PATCH v11 05/15] ntfs3: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Report NTFS case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. NTFS always preserves case at rest.

Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/ntfs3/file.c    | 25 +++++++++++++++++++++++++
 fs/ntfs3/inode.c   |  1 +
 fs/ntfs3/namei.c   |  2 ++
 fs/ntfs3/ntfs_fs.h |  1 +
 4 files changed, 29 insertions(+)

diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index b041639ab406..447ea0f9b9d5 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -180,6 +180,30 @@ long ntfs_compat_ioctl(struct file *filp, u32 cmd, unsigned long arg)
 }
 #endif
 
+/*
+ * ntfs_fileattr_get - inode_operations::fileattr_get
+ */
+int ntfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct inode *inode = d_inode(dentry);
+	struct ntfs_sb_info *sbi = inode->i_sb->s_fs_info;
+
+	/* Avoid any operation if inode is bad. */
+	if (unlikely(is_bad_ni(ntfs_i(inode))))
+		return -EINVAL;
+
+	/*
+	 * NTFS preserves case (the default). Case sensitivity depends on
+	 * mount options: with "nocase", NTFS is case-insensitive;
+	 * otherwise it is case-sensitive.
+	 */
+	if (sbi->options->nocase) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+	}
+	return 0;
+}
+
 /*
  * ntfs_getattr - inode_operations::getattr
  */
@@ -1547,6 +1571,7 @@ const struct inode_operations ntfs_file_inode_operations = {
 	.get_acl	= ntfs_get_acl,
 	.set_acl	= ntfs_set_acl,
 	.fiemap		= ntfs_fiemap,
+	.fileattr_get	= ntfs_fileattr_get,
 };
 
 const struct file_operations ntfs_file_operations = {
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index 42af1abe17f8..a5ff04c2efd3 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -2095,6 +2095,7 @@ const struct inode_operations ntfs_link_inode_operations = {
 	.get_link	= ntfs_get_link,
 	.setattr	= ntfs_setattr,
 	.listxattr	= ntfs_listxattr,
+	.fileattr_get	= ntfs_fileattr_get,
 };
 
 const struct address_space_operations ntfs_aops = {
diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c
index b2af8f695e60..eb241d7796ba 100644
--- a/fs/ntfs3/namei.c
+++ b/fs/ntfs3/namei.c
@@ -518,6 +518,7 @@ const struct inode_operations ntfs_dir_inode_operations = {
 	.getattr	= ntfs_getattr,
 	.listxattr	= ntfs_listxattr,
 	.fiemap		= ntfs_fiemap,
+	.fileattr_get	= ntfs_fileattr_get,
 };
 
 const struct inode_operations ntfs_special_inode_operations = {
@@ -526,6 +527,7 @@ const struct inode_operations ntfs_special_inode_operations = {
 	.listxattr	= ntfs_listxattr,
 	.get_acl	= ntfs_get_acl,
 	.set_acl	= ntfs_set_acl,
+	.fileattr_get	= ntfs_fileattr_get,
 };
 
 const struct dentry_operations ntfs_dentry_ops = {
diff --git a/fs/ntfs3/ntfs_fs.h b/fs/ntfs3/ntfs_fs.h
index bbf3b6a1dcbe..41db22d652c4 100644
--- a/fs/ntfs3/ntfs_fs.h
+++ b/fs/ntfs3/ntfs_fs.h
@@ -529,6 +529,7 @@ bool dir_is_empty(struct inode *dir);
 extern const struct file_operations ntfs_dir_operations;
 
 /* Globals from file.c */
+int ntfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
 int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 		 struct kstat *stat, u32 request_mask, u32 flags);
 int ntfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,

-- 
2.53.0


^ permalink raw reply related

* [PATCH v11 06/15] hfs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Report HFS case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. HFS is always case-insensitive (using Mac OS Roman case
folding) and always preserves case at rest.

Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/hfs/dir.c    |  1 +
 fs/hfs/hfs_fs.h |  2 ++
 fs/hfs/inode.c  | 14 ++++++++++++++
 3 files changed, 17 insertions(+)

diff --git a/fs/hfs/dir.c b/fs/hfs/dir.c
index f5e7efe924e7..c4c6e1623f55 100644
--- a/fs/hfs/dir.c
+++ b/fs/hfs/dir.c
@@ -328,4 +328,5 @@ const struct inode_operations hfs_dir_inode_operations = {
 	.rmdir		= hfs_remove,
 	.rename		= hfs_rename,
 	.setattr	= hfs_inode_setattr,
+	.fileattr_get	= hfs_fileattr_get,
 };
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index ac0e83f77a0f..1b23448c9a48 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -177,6 +177,8 @@ extern int hfs_get_block(struct inode *inode, sector_t block,
 extern const struct address_space_operations hfs_aops;
 extern const struct address_space_operations hfs_btree_aops;
 
+struct file_kattr;
+int hfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
 int hfs_write_begin(const struct kiocb *iocb, struct address_space *mapping,
 		    loff_t pos, unsigned int len, struct folio **foliop,
 		    void **fsdata);
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index 89b33a9d46d5..f41cc261684d 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -18,6 +18,7 @@
 #include <linux/uio.h>
 #include <linux/xattr.h>
 #include <linux/blkdev.h>
+#include <linux/fileattr.h>
 
 #include "hfs_fs.h"
 #include "btree.h"
@@ -699,6 +700,18 @@ static int hfs_file_fsync(struct file *filp, loff_t start, loff_t end,
 	return ret;
 }
 
+int hfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	/*
+	 * HFS compares filenames using Mac OS Roman case folding, so
+	 * lookup is always case-insensitive. Names are stored on disk
+	 * with case intact; CASENONPRESERVING stays clear.
+	 */
+	fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+	fa->flags |= FS_CASEFOLD_FL;
+	return 0;
+}
+
 static const struct file_operations hfs_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read_iter	= generic_file_read_iter,
@@ -715,4 +728,5 @@ static const struct inode_operations hfs_file_inode_operations = {
 	.lookup		= hfs_file_lookup,
 	.setattr	= hfs_inode_setattr,
 	.listxattr	= generic_listxattr,
+	.fileattr_get	= hfs_fileattr_get,
 };

-- 
2.53.0


^ permalink raw reply related

* [PATCH v11 07/15] hfsplus: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Add case sensitivity reporting to the existing hfsplus_fileattr_get()
function via the FS_XFLAG_CASEFOLD flag. HFS+ always preserves case
at rest.

Case sensitivity depends on how the volume was formatted: HFSX
volumes may be either case-sensitive or case-insensitive, indicated
by the HFSPLUS_SB_CASEFOLD superblock flag.

Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/hfsplus/inode.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index d05891ec492e..38b6eb659a79 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -740,6 +740,7 @@ int hfsplus_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
 {
 	struct inode *inode = d_inode(dentry);
 	struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
+	struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
 	unsigned int flags = 0;
 
 	if (inode->i_flags & S_IMMUTABLE)
@@ -751,6 +752,17 @@ int hfsplus_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
 
 	fileattr_fill_flags(fa, flags);
 
+	/*
+	 * HFS+ always preserves case at rest. Standard HFS+ volumes
+	 * are case-insensitive; HFSX volumes may be either
+	 * case-sensitive or case-insensitive depending on how they
+	 * were formatted. HFSPLUS_SB_CASEFOLD is set in both
+	 * case-insensitive variants.
+	 */
+	if (test_bit(HFSPLUS_SB_CASEFOLD, &sbi->flags)) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+	}
 	return 0;
 }
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH v11 08/15] xfs: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Upper layers such as NFSD need to query whether a filesystem
is case-sensitive. Add FS_XFLAG_CASEFOLD to xfs_ip2xflags()
when the filesystem is formatted with the ASCIICI feature
flag. This serves both FS_IOC_FSGETXATTR (via xfs_fill_fsxattr() in
xfs_fileattr_get()) and XFS_IOC_BULKSTAT (which populates bs_xflags
directly from xfs_ip2xflags()), so bulkstat consumers and per-inode
queries see a consistent view of the filesystem's case-folding
behavior.

XFS always preserves case. XFS is case-sensitive by default, but
supports ASCII case-insensitive lookups when formatted with the
ASCIICI feature flag.

Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/xfs/libxfs/xfs_inode_util.c | 2 ++
 fs/xfs/xfs_ioctl.c             | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
index 551fa51befb6..82be54b6f8d3 100644
--- a/fs/xfs/libxfs/xfs_inode_util.c
+++ b/fs/xfs/libxfs/xfs_inode_util.c
@@ -130,6 +130,8 @@ xfs_ip2xflags(
 
 	if (xfs_inode_has_attr_fork(ip))
 		flags |= FS_XFLAG_HASATTR;
+	if (xfs_has_asciici(ip->i_mount))
+		flags |= FS_XFLAG_CASEFOLD;
 	return flags;
 }
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index ed9b4846c05f..5a58fb0bad2b 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -472,6 +472,13 @@ xfs_fill_fsxattr(
 
 	fileattr_fill_xflags(fa, xfs_ip2xflags(ip));
 
+	/*
+	 * FS_XFLAG_CASEFOLD is read-only; hide it from the legacy
+	 * flags view so chattr's RMW cycle does not pass it back to
+	 * xfs_fileattr_set().
+	 */
+	fa->flags &= ~FS_CASEFOLD_FL;
+
 	if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) {
 		fa->fsx_extsize = XFS_FSB_TO_B(mp, ip->i_extsize);
 	} else if (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) {

-- 
2.53.0


^ permalink raw reply related

* [PATCH v11 09/15] cifs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Steve French, Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

Upper layers such as NFSD need a way to query whether a filesystem
handles filenames in a case-sensitive manner. Report CIFS/SMB case
handling behavior via FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING.

The authoritative source is the server itself: at mount time CIFS
issues QueryFSInfo(FS_ATTRIBUTE_INFORMATION) and caches the reply
on the tcon. That reply carries FILE_CASE_SENSITIVE_SEARCH and
FILE_CASE_PRESERVED_NAMES, which reflect whatever case handling
the share actually implements after SMB3.1.1 POSIX extensions
negotiation. Translating those two bits into the VFS flags lets
cifs_fileattr_get report what the server advertises rather than
what the client was asked to pretend.

QueryFSInfo is best-effort; the mount completes even if the server
does not answer. MaxPathNameComponentLength is zero in that case
and is used as the "no reply received" sentinel. When no reply is
available, fall back to the nocase mount option so that the reported
behavior agrees with the dentry comparison operations installed on
the superblock.

The callback is registered in all three inode_operations structures
(directory, file, and symlink) to ensure consistent reporting across
all inode types.

Registering fileattr_get routes FS_IOC_GETFLAGS through
vfs_fileattr_get() and short-circuits the syscall's fallback to
cifs_ioctl(). That fallback invoked CIFSGetExtAttr() under
CONFIG_CIFS_POSIX and CONFIG_CIFS_ALLOW_INSECURE_LEGACY on servers
advertising CIFS_UNIX_EXTATTR_CAP, surfacing the SMB1 Unix-extension
immutable, append, and nodump bits. cifs_fileattr_get carries over
only FS_COMPR_FL from cached cifsAttrs; the SMB1 extattr fetch is
not reproduced. SMB1 is deprecated, and acquiring a netfid from
within a dentry-only callback is not worth preserving a path tied
to an insecure legacy dialect.

Acked-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/smb/client/cifsfs.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 2025739f070a..d71755b59b5b 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -30,6 +30,7 @@
 #include <linux/xattr.h>
 #include <linux/mm.h>
 #include <linux/key-type.h>
+#include <linux/fileattr.h>
 #include <uapi/linux/magic.h>
 #include <net/ipv6.h>
 #include "cifsfs.h"
@@ -1199,6 +1200,44 @@ struct file_system_type smb3_fs_type = {
 MODULE_ALIAS_FS("smb3");
 MODULE_ALIAS("smb3");

+static int cifs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb);
+	struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
+	u32 attrs = le32_to_cpu(tcon->fsAttrInfo.Attributes);
+
+	/* Preserve FS_COMPR_FL previously reported by cifs_ioctl(). */
+	if (CIFS_I(d_inode(dentry))->cifsAttrs & ATTR_COMPRESSED)
+		fa->flags |= FS_COMPR_FL;
+
+	/*
+	 * The server's FS_ATTRIBUTE_INFORMATION response, cached on
+	 * the tcon at mount, reflects the share's case-handling
+	 * semantics after any POSIX extensions negotiation. Prefer
+	 * it over the client-local nocase mount option, which only
+	 * governs dentry comparison on this superblock.
+	 *
+	 * QueryFSInfo is best-effort at mount; when it did not
+	 * populate fsAttrInfo, MaxPathNameComponentLength remains
+	 * zero. In that case fall back to nocase so the reporting
+	 * matches the comparison behavior installed on the sb.
+	 */
+	if (le32_to_cpu(tcon->fsAttrInfo.MaxPathNameComponentLength) == 0) {
+		if (tcon->nocase) {
+			fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+			fa->flags |= FS_CASEFOLD_FL;
+		}
+		return 0;
+	}
+	if (!(attrs & FILE_CASE_SENSITIVE_SEARCH)) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+	}
+	if (!(attrs & FILE_CASE_PRESERVED_NAMES))
+		fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+	return 0;
+}
+
 const struct inode_operations cifs_dir_inode_ops = {
 	.create = cifs_create,
 	.atomic_open = cifs_atomic_open,
@@ -1217,6 +1256,7 @@ const struct inode_operations cifs_dir_inode_ops = {
 	.listxattr = cifs_listxattr,
 	.get_acl = cifs_get_acl,
 	.set_acl = cifs_set_acl,
+	.fileattr_get = cifs_fileattr_get,
 };

 const struct inode_operations cifs_file_inode_ops = {
@@ -1227,6 +1267,7 @@ const struct inode_operations cifs_file_inode_ops = {
 	.fiemap = cifs_fiemap,
 	.get_acl = cifs_get_acl,
 	.set_acl = cifs_set_acl,
+	.fileattr_get = cifs_fileattr_get,
 };

 const char *cifs_get_link(struct dentry *dentry, struct inode *inode,
@@ -1261,6 +1302,7 @@ const struct inode_operations cifs_symlink_inode_ops = {
 	.setattr = cifs_setattr,
 	.permission = cifs_permission,
 	.listxattr = cifs_listxattr,
+	.fileattr_get = cifs_fileattr_get,
 };

 /*

-- 
2.53.0

^ permalink raw reply related

* [PATCH v11 10/15] nfs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25  1:53 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Jan Kara
  Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
	linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
	yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
	tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>

From: Chuck Lever <chuck.lever@oracle.com>

An NFS server re-exporting an NFS mount point needs to report
the case sensitivity behavior of the underlying filesystem to
its clients. NFSD's attribute encoder obtains that information
by calling vfs_fileattr_get() on the lower filesystem, so the
NFS client must implement fileattr_get to surface what it
learned from its own server.

The NFS client already retrieves case sensitivity information
from servers during mount via PATHCONF (NFSv3) or the
FATTR4_CASE_INSENSITIVE/FATTR4_CASE_PRESERVING attributes
(NFSv4). Expose this information through fileattr_get by
reporting the FS_XFLAG_CASEFOLD and FS_XFLAG_CASENONPRESERVING
flags. NFSv2 lacks PATHCONF support, so mounts using that protocol
version default to standard POSIX behavior: case-sensitive and
case-preserving.

PATHCONF is now invoked unconditionally for NFSv2 and NFSv3 mounts
so the case-sensitivity capabilities are established even when
the user pins server->namelen with the namlen= mount option. That
option is orthogonal to case handling, and skipping PATHCONF
because namelen was already known would leave the caps unset.

The two capability bits carry opposite polarity
because their POSIX defaults differ. Most servers are
case-sensitive and case-preserving, matching "neither
xflag set." NFS_CAP_CASE_INSENSITIVE is set only when the
server affirms case insensitivity, so "server said no" and
"server did not answer" both collapse to the case-sensitive
default. NFS_CAP_CASE_NONPRESERVING follows the same pattern in
the opposite direction: set only when the server affirms that it
does not preserve case, so that silence or a missing attribute
lands on the case-preserving default. The NFSv4 probe checks
res.attr_bitmask[0] to distinguish "server said false" from "server
omitted the attribute" before setting the bit.

Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfs/client.c           | 22 +++++++++++++++++-----
 fs/nfs/inode.c            | 23 +++++++++++++++++++++++
 fs/nfs/internal.h         |  3 +++
 fs/nfs/nfs3proc.c         |  2 ++
 fs/nfs/nfs3xdr.c          |  7 +++++--
 fs/nfs/nfs4proc.c         |  7 +++++--
 fs/nfs/proc.c             |  3 +++
 fs/nfs/symlink.c          |  3 +++
 include/linux/nfs_fs_sb.h |  2 +-
 include/linux/nfs_xdr.h   |  2 ++
 10 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index be02bb227741..2f4d41ecfa71 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -933,15 +933,27 @@ static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, str
 
 	nfs_server_set_fsinfo(server, &fsinfo);
 
-	/* Get some general file system info */
-	if (server->namelen == 0) {
-		struct nfs_pathconf pathinfo;
+	{
+		struct nfs_pathconf pathinfo = { };
 
 		pathinfo.fattr = fattr;
 		nfs_fattr_init(fattr);
 
-		if (clp->rpc_ops->pathconf(server, mntfh, &pathinfo) >= 0)
-			server->namelen = pathinfo.max_namelen;
+		if (clp->rpc_ops->pathconf(server, mntfh, &pathinfo) >= 0) {
+			if (server->namelen == 0)
+				server->namelen = pathinfo.max_namelen;
+			/*
+			 * NFSv4 PATHCONF does not carry the case-sensitivity
+			 * fields; those caps are set from FATTR4_CASE_*
+			 * attributes during the set_capabilities probe.
+			 */
+			if (clp->rpc_ops->version < 4) {
+				if (pathinfo.case_insensitive)
+					server->caps |= NFS_CAP_CASE_INSENSITIVE;
+				if (!pathinfo.case_preserving)
+					server->caps |= NFS_CAP_CASE_NONPRESERVING;
+			}
+		}
 	}
 
 	if (clp->rpc_ops->discover_trunking != NULL &&
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 98a8f0de1199..bce2466552c4 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -41,6 +41,7 @@
 #include <linux/freezer.h>
 #include <linux/uaccess.h>
 #include <linux/iversion.h>
+#include <linux/fileattr.h>
 
 #include "nfs4_fs.h"
 #include "callback.h"
@@ -1101,6 +1102,28 @@ int nfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 }
 EXPORT_SYMBOL_GPL(nfs_getattr);
 
+int nfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+	struct inode *inode = d_inode(dentry);
+
+	/*
+	 * Case handling is a property of the exported filesystem on the
+	 * NFS server, reported to the client at mount via PATHCONF
+	 * (NFSv3) or FATTR4_CASE_INSENSITIVE / FATTR4_CASE_PRESERVING
+	 * (NFSv4). Unlike filesystems that always preserve case, an NFS
+	 * mount may front a backend that does not, so both flags can
+	 * appear.
+	 */
+	if (nfs_server_capable(inode, NFS_CAP_CASE_INSENSITIVE)) {
+		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+		fa->flags |= FS_CASEFOLD_FL;
+	}
+	if (nfs_server_capable(inode, NFS_CAP_CASE_NONPRESERVING))
+		fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nfs_fileattr_get);
+
 static void nfs_init_lock_context(struct nfs_lock_context *l_ctx)
 {
 	refcount_set(&l_ctx->count, 1);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index fc5456377160..309d3f679bb3 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -449,6 +449,9 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
 extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
 extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
 
+struct file_kattr;
+int nfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
+
 #if IS_ENABLED(CONFIG_NFS_LOCALIO)
 /* localio.c */
 struct nfs_local_dio {
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 95d7cd564b74..b80d0c5efc27 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -1053,6 +1053,7 @@ static const struct inode_operations nfs3_dir_inode_operations = {
 	.permission	= nfs_permission,
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 #ifdef CONFIG_NFS_V3_ACL
 	.listxattr	= nfs3_listxattr,
 	.get_inode_acl	= nfs3_get_acl,
@@ -1064,6 +1065,7 @@ static const struct inode_operations nfs3_file_inode_operations = {
 	.permission	= nfs_permission,
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 #ifdef CONFIG_NFS_V3_ACL
 	.listxattr	= nfs3_listxattr,
 	.get_inode_acl	= nfs3_get_acl,
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index e17d72908412..e745e78faab0 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2276,8 +2276,11 @@ static int decode_pathconf3resok(struct xdr_stream *xdr,
 	if (unlikely(!p))
 		return -EIO;
 	result->max_link = be32_to_cpup(p++);
-	result->max_namelen = be32_to_cpup(p);
-	/* ignore remaining fields */
+	result->max_namelen = be32_to_cpup(p++);
+	p++;	/* ignore no_trunc */
+	p++;	/* ignore chown_restricted */
+	result->case_insensitive = be32_to_cpup(p++) != 0;
+	result->case_preserving = be32_to_cpup(p) != 0;
 	return 0;
 }
 
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index d839a97df822..034e3e87e863 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3944,8 +3944,9 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
 			server->caps |= NFS_CAP_SYMLINKS;
 		if (res.case_insensitive)
 			server->caps |= NFS_CAP_CASE_INSENSITIVE;
-		if (res.case_preserving)
-			server->caps |= NFS_CAP_CASE_PRESERVING;
+		if ((res.attr_bitmask[0] & FATTR4_WORD0_CASE_PRESERVING) &&
+		    !res.case_preserving)
+			server->caps |= NFS_CAP_CASE_NONPRESERVING;
 #ifdef CONFIG_NFS_V4_SECURITY_LABEL
 		if (res.attr_bitmask[2] & FATTR4_WORD2_SECURITY_LABEL)
 			server->caps |= NFS_CAP_SECURITY_LABEL;
@@ -10598,6 +10599,7 @@ static const struct inode_operations nfs4_dir_inode_operations = {
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
 	.listxattr	= nfs4_listxattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
 
 static const struct inode_operations nfs4_file_inode_operations = {
@@ -10605,6 +10607,7 @@ static const struct inode_operations nfs4_file_inode_operations = {
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
 	.listxattr	= nfs4_listxattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
 
 static struct nfs_server *nfs4_clone_server(struct nfs_server *source,
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 70795684b8e8..03c2c1f31be9 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -598,6 +598,7 @@ nfs_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
 {
 	info->max_link = 0;
 	info->max_namelen = NFS2_MAXNAMLEN;
+	info->case_preserving = true;
 	return 0;
 }
 
@@ -718,12 +719,14 @@ static const struct inode_operations nfs_dir_inode_operations = {
 	.permission	= nfs_permission,
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
 
 static const struct inode_operations nfs_file_inode_operations = {
 	.permission	= nfs_permission,
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
 
 const struct nfs_rpc_ops nfs_v2_clientops = {
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 58146e935402..74a072896f8d 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -22,6 +22,8 @@
 #include <linux/mm.h>
 #include <linux/string.h>
 
+#include "internal.h"
+
 /* Symlink caching in the page cache is even more simplistic
  * and straight-forward than readdir caching.
  */
@@ -74,4 +76,5 @@ const struct inode_operations nfs_symlink_inode_operations = {
 	.get_link	= nfs_get_link,
 	.getattr	= nfs_getattr,
 	.setattr	= nfs_setattr,
+	.fileattr_get	= nfs_fileattr_get,
 };
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 4daee27fa5eb..34d294774f8c 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -306,7 +306,7 @@ struct nfs_server {
 #define NFS_CAP_ATOMIC_OPEN	(1U << 4)
 #define NFS_CAP_LGOPEN		(1U << 5)
 #define NFS_CAP_CASE_INSENSITIVE	(1U << 6)
-#define NFS_CAP_CASE_PRESERVING	(1U << 7)
+#define NFS_CAP_CASE_NONPRESERVING	(1U << 7)
 #define NFS_CAP_REBOOT_LAYOUTRETURN	(1U << 8)
 #define NFS_CAP_OFFLOAD_STATUS	(1U << 9)
 #define NFS_CAP_ZERO_RANGE	(1U << 10)
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index ff1f12aa73d2..7c2057e40f99 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -182,6 +182,8 @@ struct nfs_pathconf {
 	struct nfs_fattr	*fattr; /* Post-op attributes */
 	__u32			max_link; /* max # of hard links */
 	__u32			max_namelen; /* max name length */
+	bool			case_insensitive;
+	bool			case_preserving;
 };
 
 struct nfs4_change_info {

-- 
2.53.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox