* [PATCH v3 8/9] kernel/api: add API specification for sys_write
From: Sasha Levin @ 2026-04-24 16:51 UTC (permalink / raw)
To: linux-api, linux-kernel
Cc: linux-doc, linux-fsdevel, linux-kbuild, linux-kselftest,
workflows, tools, x86, Thomas Gleixner, Paul E . McKenney,
Greg Kroah-Hartman, Jonathan Corbet, Dmitry Vyukov, Randy Dunlap,
Cyril Hrubis, Kees Cook, Jake Edge, David Laight, Askar Safin,
Gabriele Paoloni, Mauro Carvalho Chehab, Christian Brauner,
Alexander Viro, Andrew Morton, Masahiro Yamada, Shuah Khan,
Ingo Molnar, Arnd Bergmann, Sasha Levin
In-Reply-To: <20260424165130.2306833-1-sashal@kernel.org>
Add KAPI-annotated kerneldoc for the sys_write system call in
fs/read_write.c.
The specification documents parameter constraints (fd, user buffer,
count), error conditions, locking requirements, signal handling
behavior, and short write semantics.
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/read_write.c | 391 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 391 insertions(+)
diff --git a/fs/read_write.c b/fs/read_write.c
index 258efd5b5793b..28312311df875 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1046,6 +1046,397 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
return ret;
}
+/**
+ * sys_write - Write data to a file descriptor
+ * @fd: File descriptor to write to
+ * @buf: User-space buffer containing data to write
+ * @count: Maximum number of bytes to write
+ *
+ * long-desc: Attempts to write up to count bytes from the buffer starting at
+ * buf to the file referred to by the file descriptor fd. For seekable files
+ * (regular files, block devices), the write begins at the current file offset,
+ * and the file offset is advanced by the number of bytes written. If the file
+ * was opened with O_APPEND, the file offset is first set to the end of the
+ * file before writing. For non-seekable files (pipes, FIFOs, sockets, character
+ * devices), the file offset is not used and writing occurs at the current
+ * position as defined by the device.
+ *
+ * The number of bytes written may be less than count if, for example, there is
+ * insufficient space on the underlying physical medium, or the RLIMIT_FSIZE
+ * resource limit is encountered, or the call was interrupted by a signal
+ * handler after having written less than count bytes. In the event of a
+ * successful partial write, the caller should make another write() call to
+ * transfer the remaining bytes. This behavior is called a "short write."
+ *
+ * On Linux, write() transfers at most MAX_RW_COUNT (0x7ffff000, approximately
+ * 2GB minus one page) bytes per call, regardless of whether the file or
+ * filesystem would allow more. This prevents signed arithmetic overflow.
+ *
+ * For regular files, a successful write() does not guarantee that data has been
+ * committed to disk. Use fsync(2) or fdatasync(2) if durability is required.
+ * For O_SYNC or O_DSYNC files, the kernel automatically syncs data on write.
+ *
+ * POSIX permits writes that are interrupted after partial writes to either
+ * return -1 with errno=EINTR, or to return the count of bytes already written.
+ * Linux implements the latter behavior: if some data has been written before
+ * a signal arrives, write() returns the number of bytes written rather than
+ * failing with EINTR.
+ *
+ * contexts: process, sleepable
+ *
+ * param: fd
+ * type: fd, input
+ * constraint-type: range(0, INT_MAX)
+ * cdesc: Must be a valid, open file descriptor with write permission.
+ * The file must have been opened with O_WRONLY or O_RDWR. File descriptors
+ * opened with O_RDONLY, O_PATH, or that have been closed return EBADF.
+ * Standard file descriptors 0 (stdin), 1 (stdout), 2 (stderr) are valid if
+ * open and writable. AT_FDCWD and other special values are not valid.
+ *
+ * param: buf
+ * type: user_ptr, input
+ * constraint-type: buffer(2)
+ * cdesc: Must point to a valid, readable user-space memory region of at
+ * least count bytes. The buffer is validated via access_ok() before any
+ * write operation. NULL is invalid and returns EFAULT. For O_DIRECT writes,
+ * the buffer may need to be aligned to the filesystem's block size (varies
+ * by filesystem; query with statx() using STATX_DIOALIGN on Linux 6.1+).
+ *
+ * param: count
+ * type: uint, input
+ * constraint-type: range(0, SIZE_MAX)
+ * cdesc: Maximum number of bytes to write. Clamped internally to
+ * MAX_RW_COUNT (INT_MAX & PAGE_MASK, approximately 0x7ffff000 bytes) to
+ * prevent signed overflow. A count of 0 is passed through to the underlying
+ * file operation and typically returns 0, but may trigger filesystem
+ * or driver-specific side effects. Cast to ssize_t must not be negative.
+ *
+ * return:
+ * type: int
+ * check-type: range
+ * success: >= 0
+ * desc: On success, returns the number of bytes written (non-negative). Zero
+ * indicates that nothing was written (count was 0, or no space available
+ * for non-blocking writes). The return value may be less than count due to
+ * resource limits, signal interruption, or device constraints (short write).
+ * On error, returns a negative error code.
+ *
+ * error: EBADF, Bad file descriptor
+ * desc: fd is not a valid file descriptor, or fd was not opened for writing.
+ * This includes file descriptors opened with O_RDONLY, O_PATH, or file
+ * descriptors that have been closed. Also returned if the file structure
+ * does not have FMODE_WRITE set.
+ *
+ * error: EFAULT, Bad address
+ * desc: buf points outside the accessible address space. The buffer address
+ * failed access_ok() validation. Can also occur if a fault happens during
+ * copy_from_user() when reading data from user space.
+ *
+ * error: EINVAL, Invalid argument
+ * desc: Returned in several cases: (1) The file descriptor refers to an
+ * object that is not suitable for writing (no write or write_iter method).
+ * (2) The file was opened with O_DIRECT and the buffer alignment, offset,
+ * or count does not meet the filesystem's alignment requirements. (3) The
+ * count argument, when cast to ssize_t, is negative. Also returned if the
+ * file lacks the FMODE_CAN_WRITE flag.
+ *
+ * error: EAGAIN, Resource temporarily unavailable
+ * desc: fd refers to a file (pipe, socket, device) that is marked non-blocking
+ * (O_NONBLOCK) and the write would block because the buffer is full.
+ * Equivalent to EWOULDBLOCK. The application should retry later or use
+ * select/poll/epoll to wait for writability.
+ *
+ * error: EWOULDBLOCK, Operation would block
+ * desc: Alias of EAGAIN on Linux (identical errno value). POSIX permits
+ * implementations to distinguish the two; Linux does not. Listed here
+ * for completeness so tooling that consults the spec does not treat
+ * EWOULDBLOCK-returning call sites as undocumented. See EAGAIN above
+ * for the conditions that trigger it.
+ *
+ * error: EINTR, Interrupted system call
+ * desc: The call was interrupted by a signal before any data was written. This
+ * only occurs if no data has been transferred; if some data was written
+ * before the signal, the call returns the number of bytes written. The
+ * caller should typically restart the write.
+ *
+ * error: EPIPE, Broken pipe
+ * desc: fd refers to a pipe or socket whose reading end has been closed.
+ * When this condition occurs, the calling process also receives a SIGPIPE
+ * signal. If the signal is caught or ignored, EPIPE is still returned.
+ * For sockets, MSG_NOSIGNAL (via send()) suppresses the signal. For
+ * pwritev2(), the RWF_NOSIGNAL flag suppresses it.
+ *
+ * error: EFBIG, File too large
+ * desc: An attempt was made to write a file that exceeds the implementation-
+ * defined maximum file size or the file size limit (RLIMIT_FSIZE) of the
+ * process. When RLIMIT_FSIZE is exceeded, the process also receives SIGXFSZ.
+ * For files not opened with O_LARGEFILE on 32-bit systems, the limit is 2GB.
+ *
+ * error: ENOSPC, No space left on device
+ * desc: The device containing the file has no room for the data. This can
+ * occur mid-write resulting in a short write followed by ENOSPC on retry.
+ *
+ * error: EDQUOT, Disk quota exceeded
+ * desc: The user's quota of disk blocks on the filesystem has been exhausted.
+ * Like ENOSPC, this can result in a short write.
+ *
+ * error: EIO, Input/output error
+ * desc: A low-level I/O error occurred while modifying the inode or writing
+ * data. This typically indicates hardware failure, filesystem corruption,
+ * or network filesystem timeout. Some data may have been written.
+ *
+ * error: EPERM, Operation not permitted
+ * desc: The operation was prevented: (1) by a file seal (F_SEAL_WRITE or
+ * F_SEAL_FUTURE_WRITE on memfd/shmem), (2) writing to an immutable inode
+ * (IS_IMMUTABLE), (3) by an LSM hook denying the operation, or (4) by a
+ * fanotify permission event denying the write.
+ *
+ * error: EOVERFLOW, Value too large for defined data type
+ * desc: The file position plus count would exceed LLONG_MAX. Also returned
+ * when the offset would exceed filesystem limits after the write.
+ *
+ * error: EDESTADDRREQ, Destination address required
+ * desc: fd is a datagram socket for which no peer address has been set using
+ * connect(2). Use sendto(2) to specify the destination address.
+ *
+ * error: ETXTBSY, Text file busy
+ * desc: The file is being used as a swap file (IS_SWAPFILE). Note: unlike
+ * the traditional Unix meaning, Linux does not return ETXTBSY when writing
+ * to an executing binary; that only blocks open() with O_WRONLY/O_RDWR.
+ *
+ * error: EXDEV, Cross-device link
+ * desc: When writing to a pipe that has been configured as a watch queue
+ * (CONFIG_WATCH_QUEUE), direct write() calls are not supported.
+ *
+ * error: ENOMEM, Out of memory
+ * desc: Insufficient kernel memory was available for the write operation.
+ * For pipes, this occurs when allocating pages for the pipe buffer.
+ *
+ * error: ERESTARTSYS, Restart system call (internal)
+ * desc: Internal error code indicating the syscall should be restarted. This
+ * is converted to EINTR if SA_RESTART is not set on the signal handler, or
+ * the syscall is transparently restarted if SA_RESTART is set. User space
+ * should not see this error code directly.
+ *
+ * error: EACCES, Permission denied
+ * desc: The security subsystem (LSM such as SELinux or AppArmor) denied the
+ * write operation via security_file_permission(). This can occur even if
+ * the file was successfully opened.
+ *
+ * lock: file->f_pos_lock
+ * type: mutex
+ * acquired: conditional
+ * released: true
+ * desc: For regular files that require atomic position updates (FMODE_ATOMIC_POS),
+ * the f_pos_lock mutex is acquired by fdget_pos() at syscall entry and released
+ * by fdput_pos() at syscall exit. This serializes concurrent writes sharing
+ * the same file description. Not acquired for stream files (FMODE_STREAM like
+ * pipes and sockets) or when the file is not shared.
+ *
+ * lock: sb->s_writers (freeze protection)
+ * type: custom
+ * acquired: conditional
+ * released: true
+ * desc: For regular files, file_start_write() acquires freeze protection on
+ * the superblock via sb_start_write() before the write, and file_end_write()
+ * releases it after. This prevents writes during filesystem freeze. Not
+ * acquired for non-regular files (pipes, sockets, devices).
+ *
+ * lock: inode->i_rwsem
+ * type: rwlock
+ * acquired: conditional
+ * released: true
+ * desc: For regular files using generic_file_write_iter(), the inode's i_rwsem
+ * is acquired in write mode before modifying file data. This is internal to
+ * the filesystem and released before return. Not all filesystems use this
+ * pattern.
+ *
+ * lock: pipe->mutex
+ * type: mutex
+ * acquired: conditional
+ * released: true
+ * desc: For pipes and FIFOs, the pipe's mutex is held while modifying pipe
+ * buffers. Released temporarily while waiting for space, then reacquired.
+ *
+ * lock: RCU read-side
+ * type: rcu
+ * acquired: conditional
+ * released: true
+ * desc: Held transiently during file descriptor table lookup within fdget().
+ * The RCU read lock is acquired and released internally by the fd lookup
+ * path, not held across the entire syscall. fdput() releases the file
+ * reference count, not the RCU lock.
+ *
+ * signal: SIGPIPE
+ * direction: send
+ * action: terminate
+ * condition: Writing to a pipe or socket with no readers
+ * desc: When writing to a pipe whose read end is closed, or a socket whose
+ * peer has closed, SIGPIPE is sent to the calling process. The default
+ * action terminates the process. Use signal(SIGPIPE, SIG_IGN) to suppress
+ * for write(). EPIPE is returned regardless of signal disposition.
+ * timing: during
+ *
+ * signal: SIGXFSZ
+ * direction: send
+ * action: coredump
+ * condition: Writing exceeds RLIMIT_FSIZE
+ * desc: When a write would exceed the soft file size limit (RLIMIT_FSIZE),
+ * SIGXFSZ is sent. The default action terminates with a core dump. The
+ * write returns EFBIG. If RLIMIT_FSIZE is RLIM_INFINITY, no signal is sent.
+ * timing: during
+ *
+ * signal: Any signal
+ * direction: receive
+ * action: return
+ * condition: While blocked waiting for space (pipes, sockets)
+ * desc: The syscall may be interrupted by signals while waiting for buffer
+ * space to become available. If interrupted before any data is written,
+ * returns -EINTR or -ERESTARTSYS. If data was already written, returns the
+ * byte count. Restartable if SA_RESTART is set and no data was written.
+ * errno: -EINTR
+ * timing: during
+ * restartable: yes
+ *
+ * side-effect: file_position
+ * target: file->f_pos
+ * condition: For seekable files when write succeeds (returns > 0)
+ * desc: The file offset (f_pos) is advanced by the number of bytes written.
+ * For files opened with O_APPEND, f_pos is first set to file size. For
+ * stream files (FMODE_STREAM such as pipes and sockets), the offset is not
+ * used or modified. Position updates are protected by f_pos_lock when
+ * shared.
+ * reversible: no
+ *
+ * side-effect: modify_state
+ * target: inode timestamps (mtime, ctime)
+ * condition: When write succeeds (returns > 0)
+ * desc: Updates the file's modification time (mtime) and change time (ctime)
+ * via file_update_time(). The update precision depends on filesystem mount
+ * options (fine-grained timestamps for multigrain inodes).
+ * reversible: no
+ *
+ * side-effect: modify_state
+ * target: SUID/SGID bits (mode)
+ * condition: When writing to a setuid/setgid file
+ * desc: The SUID bit is cleared when a non-root user writes to a file with
+ * the bit set. The SGID bit may also be cleared. This is a security feature
+ * to prevent privilege escalation via modified setuid binaries. Done via
+ * file_remove_privs().
+ * reversible: no
+ *
+ * side-effect: modify_state
+ * target: file data
+ * condition: When write succeeds (returns > 0)
+ * desc: Modifies the file's data content. For regular files, data is written
+ * to the page cache (buffered I/O) or directly to storage (O_DIRECT).
+ * Data may not be persistent until fsync() is called or the file is closed.
+ * reversible: no
+ *
+ * side-effect: modify_state
+ * target: task I/O accounting
+ * condition: Always
+ * desc: Updates the current task's I/O accounting statistics. The wchar field
+ * (write characters) is incremented by bytes written via add_wchar() only on
+ * successful writes (ret > 0). The syscw field (syscall write count) is
+ * incremented via inc_syscw() when the write operation is attempted
+ * (after passing initial validation checks). These statistics are visible
+ * in /proc/[pid]/io.
+ * reversible: no
+ *
+ * side-effect: modify_state
+ * target: fsnotify events
+ * condition: When write returns > 0
+ * desc: Generates an FS_MODIFY fsnotify event via fsnotify_modify(), allowing
+ * inotify, fanotify, and dnotify watchers to be notified of the write.
+ *
+ * capability: CAP_DAC_OVERRIDE
+ * type: bypass_check
+ * allows: Bypass discretionary access control on write permission
+ * without: Standard DAC checks are enforced
+ * condition: Checked at open time via inode_permission(), not during read()
+ *
+ * capability: CAP_FSETID
+ * type: bypass_check
+ * allows: Bypass ownership checks for SUID/SGID clearing
+ * without: SUID/SGID bits are cleared on write by non-owner
+ * condition: Checked during file_remove_privs()
+ *
+ * constraint: MAX_RW_COUNT
+ * desc: The count parameter is silently clamped to MAX_RW_COUNT (INT_MAX &
+ * PAGE_MASK, approximately 2GB minus one page) to prevent integer overflow
+ * in internal calculations. This is transparent to the caller.
+ * expr: actual_count = min(count, MAX_RW_COUNT)
+ *
+ * constraint: File must be open for writing
+ * desc: The file descriptor must have been opened with O_WRONLY or O_RDWR.
+ * Files opened with O_RDONLY or O_PATH cannot be written and return EBADF.
+ * The file must have both FMODE_WRITE and FMODE_CAN_WRITE flags set.
+ * expr: (file->f_mode & FMODE_WRITE) && (file->f_mode & FMODE_CAN_WRITE)
+ *
+ * constraint: RLIMIT_FSIZE
+ * desc: The size of data written is constrained by the RLIMIT_FSIZE resource
+ * limit. If writing would exceed this limit, SIGXFSZ is sent and EFBIG is
+ * returned. The limit does not apply to files beyond the limit - only to
+ * writes that would cross it.
+ * expr: pos + count <= rlimit(RLIMIT_FSIZE) || rlimit(RLIMIT_FSIZE) == RLIM_INFINITY
+ *
+ * constraint: File seals
+ * desc: For memfd or shmem files with F_SEAL_WRITE or F_SEAL_FUTURE_WRITE
+ * seals applied, all write operations fail with EPERM. With F_SEAL_GROW,
+ * writes that would extend file size fail with EPERM.
+ *
+ * examples: n = write(fd, buf, sizeof(buf)); // Basic write
+ * n = write(STDOUT_FILENO, msg, strlen(msg)); // Write to stdout
+ * // Handle short writes:
+ * while (total < len) {
+ * n = write(fd, buf + total, len - total);
+ * if (n < 0) break;
+ * total += n;
+ * }
+ * // Pipe error handling:
+ * if (write(pipefd[1], &byte, 1) < 0 && errno == EPIPE)
+ * handle_broken_pipe();
+ *
+ * notes: The behavior of write() varies significantly depending on the type of
+ * file descriptor:
+ *
+ * - Regular files: Writes to the page cache (buffered) or directly to storage
+ * (O_DIRECT). Short writes are rare except near RLIMIT_FSIZE or disk full.
+ * O_APPEND is atomic for determining write position.
+ *
+ * - Pipes and FIFOs: Blocking by default. Writes up to PIPE_BUF (4096 bytes
+ * on Linux) are guaranteed atomic. Larger writes may be interleaved with
+ * writes from other processes. Blocks if pipe is full; returns EAGAIN with
+ * O_NONBLOCK. SIGPIPE/EPIPE if no readers.
+ *
+ * - Sockets: Behavior depends on socket type and protocol. Stream sockets
+ * (TCP) may return partial writes. Datagram sockets (UDP) typically write
+ * complete messages or fail. SIGPIPE/EPIPE for broken connections (unless
+ * MSG_NOSIGNAL). EDESTADDRREQ for unconnected datagram sockets.
+ *
+ * - Terminals: May block on flow control. Canonical vs raw mode affects
+ * behavior. Special characters may be interpreted.
+ *
+ * - Device special files: Behavior is device-specific. Block devices behave
+ * similarly to regular files. Character device behavior varies.
+ *
+ * Race condition considerations: Concurrent writes from threads sharing a
+ * file description race on the file position. Linux 3.14+ provides atomic
+ * position updates via f_pos_lock for regular files (FMODE_ATOMIC_POS), but
+ * for maximum safety, use pwrite() for concurrent positioned writes.
+ *
+ * O_DIRECT writes bypass the page cache and typically require buffer and
+ * offset alignment to filesystem block size. Query requirements via statx()
+ * with STATX_DIOALIGN (Linux 6.1+). Unaligned O_DIRECT writes return EINVAL
+ * on most filesystems.
+ *
+ * For zero-copy writes, consider using splice(2), sendfile(2), or vmsplice(2)
+ * instead of copying data through user-space buffers with write().
+ *
+ * Partial writes (short writes) must be handled by application code.
+ * Applications should loop until all data is written or an error occurs.
+ */
SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
size_t, count)
{
--
2.53.0
^ permalink raw reply related
* [PATCH v3 9/9] kernel/api: add runtime verification selftest
From: Sasha Levin @ 2026-04-24 16:51 UTC (permalink / raw)
To: linux-api, linux-kernel
Cc: linux-doc, linux-fsdevel, linux-kbuild, linux-kselftest,
workflows, tools, x86, Thomas Gleixner, Paul E . McKenney,
Greg Kroah-Hartman, Jonathan Corbet, Dmitry Vyukov, Randy Dunlap,
Cyril Hrubis, Kees Cook, Jake Edge, David Laight, Askar Safin,
Gabriele Paoloni, Mauro Carvalho Chehab, Christian Brauner,
Alexander Viro, Andrew Morton, Masahiro Yamada, Shuah Khan,
Ingo Molnar, Arnd Bergmann, Sasha Levin
In-Reply-To: <20260424165130.2306833-1-sashal@kernel.org>
Add a selftest for CONFIG_KAPI_RUNTIME_CHECKS that exercises
sys_open/sys_read/sys_write/sys_close through raw syscall() and
verifies KAPI pre-validation catches invalid parameters while
allowing valid operations through.
Test cases (TAP output):
1-4: Valid open/read/write/close succeed
5-7: Invalid flags, mode bits, NULL path rejected with EINVAL
8: dmesg contains expected KAPI warning strings
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/kapi/Makefile | 7 +
tools/testing/selftests/kapi/kapi_test_util.h | 33 +
tools/testing/selftests/kapi/test_kapi.c | 1096 +++++++++++++++++
5 files changed, 1138 insertions(+)
create mode 100644 tools/testing/selftests/kapi/Makefile
create mode 100644 tools/testing/selftests/kapi/kapi_test_util.h
create mode 100644 tools/testing/selftests/kapi/test_kapi.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 0d14205077908..ddfd9cad98916 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13826,6 +13826,7 @@ F: include/linux/kernel_api_spec.h
F: kernel/api/
F: tools/kapi/
F: tools/lib/python/kdoc/kdoc_apispec.py
+F: tools/testing/selftests/kapi/
KERNEL AUTOMOUNTER
M: Ian Kent <raven@themaw.net>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 450f13ba4cca9..7881bec5aafe1 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -48,6 +48,7 @@ TARGETS += intel_pstate
TARGETS += iommu
TARGETS += ipc
TARGETS += ir
+TARGETS += kapi
TARGETS += kcmp
TARGETS += kexec
TARGETS += kselftest_harness
diff --git a/tools/testing/selftests/kapi/Makefile b/tools/testing/selftests/kapi/Makefile
new file mode 100644
index 0000000000000..32a750901b111
--- /dev/null
+++ b/tools/testing/selftests/kapi/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+TEST_GEN_PROGS := test_kapi
+
+CFLAGS += -static -Wall -Wextra -Werror -O2 $(KHDR_INCLUDES)
+
+include ../lib.mk
diff --git a/tools/testing/selftests/kapi/kapi_test_util.h b/tools/testing/selftests/kapi/kapi_test_util.h
new file mode 100644
index 0000000000000..e097c370542ad
--- /dev/null
+++ b/tools/testing/selftests/kapi/kapi_test_util.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+ *
+ * Compatibility helpers for KAPI selftests.
+ *
+ * __NR_open is not defined on aarch64 and riscv64 (only __NR_openat exists).
+ * Provide a wrapper that uses __NR_openat with AT_FDCWD to achieve the same
+ * behavior as __NR_open on architectures that lack it.
+ */
+#ifndef KAPI_TEST_UTIL_H
+#define KAPI_TEST_UTIL_H
+
+#include <fcntl.h>
+#include <sys/syscall.h>
+
+#ifndef __NR_open
+/*
+ * On architectures without __NR_open (e.g., aarch64, riscv64),
+ * use openat(AT_FDCWD, ...) which is equivalent.
+ */
+static inline long kapi_sys_open(const char *pathname, int flags, int mode)
+{
+ return syscall(__NR_openat, AT_FDCWD, pathname, flags, mode);
+}
+#else
+static inline long kapi_sys_open(const char *pathname, int flags, int mode)
+{
+ return syscall(__NR_open, pathname, flags, mode);
+}
+#endif
+
+#endif /* KAPI_TEST_UTIL_H */
diff --git a/tools/testing/selftests/kapi/test_kapi.c b/tools/testing/selftests/kapi/test_kapi.c
new file mode 100644
index 0000000000000..a6b7576f95c3e
--- /dev/null
+++ b/tools/testing/selftests/kapi/test_kapi.c
@@ -0,0 +1,1096 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+ *
+ * Userspace selftest for KAPI runtime verification of syscall parameters.
+ *
+ * Exercises sys_open, sys_read, sys_write, and sys_close through raw
+ * syscall() to ensure KAPI pre-validation wrappers interact correctly
+ * with normal kernel error handling.
+ *
+ * Requires CONFIG_KAPI_RUNTIME_CHECKS=y for full coverage; many tests
+ * also pass without it.
+ *
+ * TAP output format.
+ */
+
+#define _GNU_SOURCE
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <signal.h>
+#include <sys/syscall.h>
+#include <sys/stat.h>
+#include <linux/limits.h>
+#include "../kselftest.h"
+#include "kapi_test_util.h"
+
+#define NUM_TESTS 29
+
+/*
+ * Set from the SIGPIPE handler. `volatile sig_atomic_t` is the POSIX-
+ * mandated type for flags touched by async-signal-safe handlers;
+ * checkpatch's generic "volatile considered harmful" warning targets
+ * kernel code and does not apply here.
+ */
+static volatile sig_atomic_t got_sigpipe;
+
+/*
+ * The tap_* helpers are thin wrappers around ksft_test_result_* so the
+ * rest of this file reads like the original author wrote it, while the
+ * output goes through the shared kselftest harness.
+ */
+static void tap_ok(const char *desc)
+{
+ ksft_test_result_pass("%s\n", desc);
+}
+
+static void tap_fail(const char *desc, const char *reason)
+{
+ ksft_test_result_fail("%s: %s\n", desc, reason);
+}
+
+static void tap_skip(const char *desc, const char *reason)
+{
+ ksft_test_result_skip("%s: %s\n", desc, reason);
+}
+
+/*
+ * Return true when the kernel provides the kapi runtime-check surface.
+ * Tests that rely on KAPI rejecting bad parameters pre-call should be
+ * skipped on kernels without it, not reported as failures.
+ */
+static bool kapi_runtime_checks_active(void)
+{
+ struct stat st;
+
+ return stat("/sys/kernel/debug/kapi", &st) == 0 && S_ISDIR(st.st_mode);
+}
+
+static void sigpipe_handler(int sig)
+{
+ (void)sig;
+ got_sigpipe = 1;
+}
+
+/* ---- Valid operation tests ---- */
+
+/*
+ * Test 1: open a readable file
+ * Returns fd on success.
+ */
+static int test_open_valid(void)
+{
+ errno = 0;
+ long fd = kapi_sys_open("/etc/hostname", O_RDONLY, 0);
+
+ if (fd >= 0) {
+ tap_ok("open valid file");
+ } else {
+ /* /etc/hostname might not exist; try /etc/passwd */
+ errno = 0;
+ fd = kapi_sys_open("/etc/passwd", O_RDONLY, 0);
+ if (fd >= 0)
+ tap_ok("open valid file (fallback /etc/passwd)");
+ else
+ tap_fail("open valid file", strerror(errno));
+ }
+ return (int)fd;
+}
+
+/*
+ * Test 2: read from fd
+ */
+static void test_read_valid(int fd)
+{
+ char buf[256];
+
+ errno = 0;
+ long ret = syscall(__NR_read, fd, buf, sizeof(buf));
+
+ if (ret > 0)
+ tap_ok("read from valid fd");
+ else if (ret == 0)
+ tap_ok("read from valid fd (EOF)");
+ else
+ tap_fail("read from valid fd", strerror(errno));
+}
+
+/*
+ * Test 3: write to /dev/null
+ */
+static void test_write_valid(void)
+{
+ errno = 0;
+ long devnull = kapi_sys_open("/dev/null", O_WRONLY, 0);
+
+ if (devnull < 0) {
+ tap_fail("write to /dev/null (open failed)", strerror(errno));
+ return;
+ }
+
+ errno = 0;
+ long ret = syscall(__NR_write, (int)devnull, "hello", 5);
+
+ if (ret == 5)
+ tap_ok("write to /dev/null");
+ else
+ tap_fail("write to /dev/null",
+ ret < 0 ? strerror(errno) : "short write");
+
+ syscall(__NR_close, (int)devnull);
+}
+
+/*
+ * Test 4: close fd
+ */
+static void test_close_valid(int fd)
+{
+ errno = 0;
+ long ret = syscall(__NR_close, fd);
+
+ if (ret == 0)
+ tap_ok("close valid fd");
+ else
+ tap_fail("close valid fd", strerror(errno));
+}
+
+/* ---- KAPI parameter rejection tests ---- */
+
+/*
+ * Test 5: open with invalid flag bits
+ * 0x10000000 is outside the valid O_* mask, KAPI should reject.
+ */
+static void test_open_invalid_flags(void)
+{
+ long ret;
+
+ if (!kapi_runtime_checks_active()) {
+ tap_skip("open with invalid flags",
+ "CONFIG_KAPI_RUNTIME_CHECKS not enabled");
+ return;
+ }
+
+ errno = 0;
+ /*
+ * Use /dev/null (always present on any sane rootfs) so KAPI's flag
+ * validation is reached before a path-lookup ENOENT can mask it.
+ * 0x10000000 is outside the valid O_* mask.
+ */
+ ret = kapi_sys_open("/dev/null", 0x10000000, 0);
+
+ if (ret == -1 && errno == EINVAL) {
+ tap_ok("open with invalid flags returns EINVAL");
+ } else if (ret >= 0) {
+ tap_fail("open with invalid flags", "expected EINVAL, got success");
+ syscall(__NR_close, (int)ret);
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected EINVAL, got %s",
+ strerror(errno));
+ tap_fail("open with invalid flags", msg);
+ }
+}
+
+/*
+ * Test 6: open with invalid mode bits
+ * 0xFFFF has bits outside S_IALLUGO (07777), KAPI should reject.
+ */
+static void test_open_invalid_mode(void)
+{
+ long ret;
+
+ if (!kapi_runtime_checks_active()) {
+ tap_skip("open with invalid mode",
+ "CONFIG_KAPI_RUNTIME_CHECKS not enabled");
+ return;
+ }
+
+ errno = 0;
+ ret = kapi_sys_open("/tmp/kapi_test_mode",
+ O_CREAT | O_WRONLY | O_EXCL, 0xFFFF);
+
+ if (ret == -1 && errno == EINVAL) {
+ tap_ok("open with invalid mode returns EINVAL");
+ } else if (ret >= 0) {
+ tap_fail("open with invalid mode", "expected EINVAL, got success");
+ syscall(__NR_close, (int)ret);
+ unlink("/tmp/kapi_test_mode");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected EINVAL, got %s",
+ strerror(errno));
+ tap_fail("open with invalid mode", msg);
+ }
+}
+
+/*
+ * Test 7: open with NULL path
+ * KAPI USER_PATH constraint should reject NULL.
+ */
+static void test_open_null_path(void)
+{
+ errno = 0;
+ long ret = kapi_sys_open(NULL, O_RDONLY, 0);
+
+ if (ret == -1 && errno == EINVAL) {
+ tap_ok("open with NULL path returns EINVAL");
+ } else if (ret == -1 && errno == EFAULT) {
+ /* Kernel may catch this as EFAULT before KAPI */
+ tap_ok("open with NULL path returns EFAULT (acceptable)");
+ } else if (ret >= 0) {
+ tap_fail("open with NULL path", "expected error, got success");
+ syscall(__NR_close, (int)ret);
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "got %s", strerror(errno));
+ tap_fail("open with NULL path", msg);
+ }
+}
+
+/*
+ * Test 8: open with flag bit 30 set (0x40000000)
+ * This bit is outside the valid O_* mask, KAPI should reject with EINVAL.
+ */
+static void test_open_flag_bit30(void)
+{
+ long ret;
+
+ if (!kapi_runtime_checks_active()) {
+ tap_skip("open with flag bit 30 (0x40000000) returns EINVAL",
+ "CONFIG_KAPI_RUNTIME_CHECKS not enabled");
+ return;
+ }
+
+ errno = 0;
+ ret = kapi_sys_open("/dev/null", 0x40000000, 0);
+
+ if (ret == -1 && errno == EINVAL) {
+ tap_ok("open with flag bit 30 (0x40000000) returns EINVAL");
+ } else if (ret >= 0) {
+ tap_fail("open with flag bit 30 (0x40000000) returns EINVAL",
+ "expected EINVAL, got success");
+ syscall(__NR_close, (int)ret);
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected EINVAL, got %s",
+ strerror(errno));
+ tap_fail("open with flag bit 30 (0x40000000) returns EINVAL",
+ msg);
+ }
+}
+
+/* ---- Boundary condition and error path tests ---- */
+
+/*
+ * Test 9: read with fd=-1 should return an error.
+ * With CONFIG_KAPI_RUNTIME_CHECKS=y, KAPI validates the fd first and
+ * rejects negative fds (other than AT_FDCWD) with EINVAL. Without
+ * KAPI, the kernel returns EBADF. Accept either.
+ */
+static void test_read_bad_fd(void)
+{
+ char buf[16];
+
+ errno = 0;
+ long ret = syscall(__NR_read, -1, buf, sizeof(buf));
+
+ if (ret == -1 && (errno == EBADF || errno == EINVAL)) {
+ tap_ok("read with fd=-1 returns error");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected EBADF/EINVAL, got %s",
+ ret >= 0 ? "success" : strerror(errno));
+ tap_fail("read with fd=-1 returns error", msg);
+ }
+}
+
+/*
+ * Test 10: read with count=0 should return 0
+ */
+static void test_read_zero_count(void)
+{
+ char buf[1];
+ long fd;
+
+ errno = 0;
+ fd = kapi_sys_open("/dev/null", O_RDONLY, 0);
+ if (fd < 0) {
+ tap_fail("read with count=0 returns 0",
+ "cannot open /dev/null");
+ return;
+ }
+
+ errno = 0;
+ long ret = syscall(__NR_read, (int)fd, buf, 0);
+
+ if (ret == 0) {
+ tap_ok("read with count=0 returns 0");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected 0, got %ld (errno=%s)",
+ ret, strerror(errno));
+ tap_fail("read with count=0 returns 0", msg);
+ }
+
+ syscall(__NR_close, (int)fd);
+}
+
+/*
+ * Test 11: write with count=0 should return 0
+ */
+static void test_write_zero_count(void)
+{
+ long fd;
+
+ errno = 0;
+ fd = kapi_sys_open("/dev/null", O_WRONLY, 0);
+ if (fd < 0) {
+ tap_fail("write with count=0 returns 0",
+ "cannot open /dev/null");
+ return;
+ }
+
+ errno = 0;
+ long ret = syscall(__NR_write, (int)fd, "x", 0);
+
+ if (ret == 0) {
+ tap_ok("write with count=0 returns 0");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected 0, got %ld (errno=%s)",
+ ret, strerror(errno));
+ tap_fail("write with count=0 returns 0", msg);
+ }
+
+ syscall(__NR_close, (int)fd);
+}
+
+/*
+ * Test 12: open with a path longer than PATH_MAX should fail
+ * Expect ENAMETOOLONG or EINVAL.
+ */
+static void test_open_long_path(void)
+{
+ char *longpath;
+ size_t len = PATH_MAX + 256;
+
+ longpath = malloc(len);
+ if (!longpath) {
+ tap_fail("open with path > PATH_MAX", "malloc failed");
+ return;
+ }
+
+ memset(longpath, 'A', len - 1);
+ longpath[0] = '/';
+ longpath[len - 1] = '\0';
+
+ errno = 0;
+ long ret = kapi_sys_open(longpath, O_RDONLY, 0);
+
+ if (ret == -1 && (errno == ENAMETOOLONG || errno == EINVAL)) {
+ tap_ok("open with path > PATH_MAX returns ENAMETOOLONG/EINVAL");
+ } else if (ret >= 0) {
+ tap_fail("open with path > PATH_MAX",
+ "expected error, got success");
+ syscall(__NR_close, (int)ret);
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg),
+ "expected ENAMETOOLONG/EINVAL, got %s",
+ strerror(errno));
+ tap_fail("open with path > PATH_MAX", msg);
+ }
+
+ free(longpath);
+}
+
+/*
+ * Test 13: read with unmapped user pointer should return EFAULT or EINVAL.
+ * Use a pipe with data so the kernel actually tries to copy to the buffer.
+ */
+static void test_read_unmapped_buf(void)
+{
+ int pipefd[2];
+
+ if (pipe(pipefd) < 0) {
+ tap_fail("read with unmapped buffer returns EFAULT/EINVAL",
+ "pipe() failed");
+ return;
+ }
+
+ /* Write some data so read has something to copy */
+ (void)write(pipefd[1], "hello", 5);
+
+ errno = 0;
+ long ret = syscall(__NR_read, pipefd[0], (void *)0xDEAD0000, 16);
+
+ if (ret == -1 && (errno == EFAULT || errno == EINVAL)) {
+ tap_ok("read with unmapped buffer returns EFAULT/EINVAL");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg),
+ "expected EFAULT/EINVAL, got %s",
+ ret >= 0 ? "success" : strerror(errno));
+ tap_fail("read with unmapped buffer returns EFAULT/EINVAL",
+ msg);
+ }
+
+ close(pipefd[0]);
+ close(pipefd[1]);
+}
+
+/*
+ * Test 14: write with unmapped user pointer should return EFAULT or EINVAL.
+ * Use a pipe so the kernel actually tries to copy from the buffer.
+ */
+static void test_write_unmapped_buf(void)
+{
+ int pipefd[2];
+
+ if (pipe(pipefd) < 0) {
+ tap_fail("write with unmapped buffer returns EFAULT/EINVAL",
+ "pipe() failed");
+ return;
+ }
+
+ errno = 0;
+ long ret = syscall(__NR_write, pipefd[1], (void *)0xDEAD0000, 16);
+
+ if (ret == -1 && (errno == EFAULT || errno == EINVAL)) {
+ tap_ok("write with unmapped buffer returns EFAULT/EINVAL");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg),
+ "expected EFAULT/EINVAL, got %s",
+ ret >= 0 ? "success" : strerror(errno));
+ tap_fail("write with unmapped buffer returns EFAULT/EINVAL",
+ msg);
+ }
+
+ close(pipefd[0]);
+ close(pipefd[1]);
+}
+
+/*
+ * Test 15: close an already-closed fd should return EBADF
+ */
+static void test_close_already_closed(void)
+{
+ long fd;
+
+ errno = 0;
+ fd = kapi_sys_open("/dev/null", O_RDONLY, 0);
+ if (fd < 0) {
+ tap_fail("close already-closed fd returns EBADF",
+ "cannot open /dev/null");
+ return;
+ }
+
+ /* Close it once - should succeed */
+ syscall(__NR_close, (int)fd);
+
+ /* Close it again - should fail with EBADF */
+ errno = 0;
+ long ret = syscall(__NR_close, (int)fd);
+
+ if (ret == -1 && errno == EBADF) {
+ tap_ok("close already-closed fd returns EBADF");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected EBADF, got %s",
+ ret == 0 ? "success" : strerror(errno));
+ tap_fail("close already-closed fd returns EBADF", msg);
+ }
+}
+
+/*
+ * Test 16: open /dev/null with O_RDONLY|O_CLOEXEC should succeed
+ */
+static void test_open_valid_cloexec(void)
+{
+ errno = 0;
+ long fd = kapi_sys_open("/dev/null", O_RDONLY | O_CLOEXEC, 0);
+
+ if (fd >= 0) {
+ tap_ok("open /dev/null with O_RDONLY|O_CLOEXEC succeeds");
+ syscall(__NR_close, (int)fd);
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected success, got %s",
+ strerror(errno));
+ tap_fail("open /dev/null with O_RDONLY|O_CLOEXEC succeeds",
+ msg);
+ }
+}
+
+/*
+ * Test 17: write 0 bytes to /dev/null should return 0
+ */
+static void test_write_zero_devnull(void)
+{
+ long fd;
+
+ errno = 0;
+ fd = kapi_sys_open("/dev/null", O_WRONLY, 0);
+ if (fd < 0) {
+ tap_fail("write 0 bytes to /dev/null returns 0",
+ "cannot open /dev/null");
+ return;
+ }
+
+ errno = 0;
+ long ret = syscall(__NR_write, (int)fd, "", 0);
+
+ if (ret == 0) {
+ tap_ok("write 0 bytes to /dev/null returns 0");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected 0, got %ld (errno=%s)",
+ ret, strerror(errno));
+ tap_fail("write 0 bytes to /dev/null returns 0", msg);
+ }
+
+ syscall(__NR_close, (int)fd);
+}
+
+/*
+ * Test 18: read from a write-only fd should return EBADF
+ */
+static void test_read_writeonly_fd(void)
+{
+ long fd;
+
+ errno = 0;
+ fd = kapi_sys_open("/dev/null", O_WRONLY, 0);
+ if (fd < 0) {
+ tap_fail("read from write-only fd returns EBADF",
+ "cannot open /dev/null");
+ return;
+ }
+
+ char buf[16];
+
+ errno = 0;
+ long ret = syscall(__NR_read, (int)fd, buf, sizeof(buf));
+
+ if (ret == -1 && errno == EBADF) {
+ tap_ok("read from write-only fd returns EBADF");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected EBADF, got %s",
+ ret >= 0 ? "success" : strerror(errno));
+ tap_fail("read from write-only fd returns EBADF", msg);
+ }
+
+ syscall(__NR_close, (int)fd);
+}
+
+/*
+ * Test 19: write to a read-only fd should return EBADF
+ */
+static void test_write_readonly_fd(void)
+{
+ long fd;
+
+ errno = 0;
+ fd = kapi_sys_open("/dev/null", O_RDONLY, 0);
+ if (fd < 0) {
+ tap_fail("write to read-only fd returns EBADF",
+ "cannot open /dev/null");
+ return;
+ }
+
+ errno = 0;
+ long ret = syscall(__NR_write, (int)fd, "hello", 5);
+
+ if (ret == -1 && errno == EBADF) {
+ tap_ok("write to read-only fd returns EBADF");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected EBADF, got %s",
+ ret >= 0 ? "success" : strerror(errno));
+ tap_fail("write to read-only fd returns EBADF", msg);
+ }
+
+ syscall(__NR_close, (int)fd);
+}
+
+/*
+ * Test 20: close fd 9999 (likely invalid) should return EBADF
+ */
+static void test_close_fd_9999(void)
+{
+ errno = 0;
+ long ret = syscall(__NR_close, 9999);
+
+ if (ret == -1 && errno == EBADF) {
+ tap_ok("close fd 9999 returns EBADF");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected EBADF, got %s",
+ ret == 0 ? "success" : strerror(errno));
+ tap_fail("close fd 9999 returns EBADF", msg);
+ }
+}
+
+/*
+ * Test 21: read from pipe after write end is closed returns 0 (EOF)
+ */
+static void test_read_closed_pipe(void)
+{
+ int pipefd[2];
+
+ if (pipe(pipefd) < 0) {
+ tap_fail("read from closed pipe returns 0 (EOF)",
+ "pipe() failed");
+ return;
+ }
+
+ /* Close write end */
+ close(pipefd[1]);
+
+ char buf[16];
+
+ errno = 0;
+ long ret = syscall(__NR_read, pipefd[0], buf, sizeof(buf));
+
+ if (ret == 0) {
+ tap_ok("read from closed pipe returns 0 (EOF)");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected 0, got %ld (errno=%s)",
+ ret, ret < 0 ? strerror(errno) : "n/a");
+ tap_fail("read from closed pipe returns 0 (EOF)", msg);
+ }
+
+ close(pipefd[0]);
+}
+
+/*
+ * Test 22: write to pipe after read end is closed returns EPIPE + SIGPIPE
+ */
+static void test_write_closed_pipe(void)
+{
+ int pipefd[2];
+ struct sigaction sa, old_sa;
+
+ if (pipe(pipefd) < 0) {
+ tap_fail("write to closed pipe returns EPIPE + SIGPIPE",
+ "pipe() failed");
+ return;
+ }
+
+ /* Install SIGPIPE handler */
+ memset(&sa, 0, sizeof(sa));
+ sa.sa_handler = sigpipe_handler;
+ sigemptyset(&sa.sa_mask);
+ sigaction(SIGPIPE, &sa, &old_sa);
+
+ got_sigpipe = 0;
+
+ /* Close read end */
+ close(pipefd[0]);
+
+ errno = 0;
+ long ret = syscall(__NR_write, pipefd[1], "hello", 5);
+
+ if (ret == -1 && errno == EPIPE && got_sigpipe) {
+ tap_ok("write to closed pipe returns EPIPE + SIGPIPE");
+ } else if (ret == -1 && errno == EPIPE) {
+ tap_ok("write to closed pipe returns EPIPE (SIGPIPE not caught)");
+ } else {
+ char msg[128];
+
+ snprintf(msg, sizeof(msg),
+ "expected EPIPE, got %s (sigpipe=%d)",
+ ret >= 0 ? "success" : strerror(errno),
+ (int)got_sigpipe);
+ tap_fail("write to closed pipe returns EPIPE + SIGPIPE", msg);
+ }
+
+ /* Restore SIGPIPE handler */
+ sigaction(SIGPIPE, &old_sa, NULL);
+ close(pipefd[1]);
+}
+
+/*
+ * Test 23: open with O_DIRECTORY on a regular file returns ENOTDIR
+ */
+static void test_open_directory_on_file(void)
+{
+ errno = 0;
+ long ret = kapi_sys_open("/dev/null", O_RDONLY | O_DIRECTORY, 0);
+
+ if (ret == -1 && errno == ENOTDIR) {
+ tap_ok("open O_DIRECTORY on regular file returns ENOTDIR");
+ } else if (ret >= 0) {
+ tap_fail("open O_DIRECTORY on regular file",
+ "expected ENOTDIR, got success");
+ syscall(__NR_close, (int)ret);
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected ENOTDIR, got %s",
+ strerror(errno));
+ tap_fail("open O_DIRECTORY on regular file", msg);
+ }
+}
+
+/*
+ * Test 24: open nonexistent file without O_CREAT returns ENOENT
+ */
+static void test_open_nonexistent(void)
+{
+ errno = 0;
+ long ret = kapi_sys_open("/tmp/kapi_nonexistent_file_12345",
+ O_RDONLY, 0);
+
+ if (ret == -1 && errno == ENOENT) {
+ tap_ok("open nonexistent file without O_CREAT returns ENOENT");
+ } else if (ret >= 0) {
+ tap_fail("open nonexistent file",
+ "expected ENOENT, got success (file exists?)");
+ syscall(__NR_close, (int)ret);
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected ENOENT, got %s",
+ strerror(errno));
+ tap_fail("open nonexistent file", msg);
+ }
+}
+
+/*
+ * Test 25: close stdin (fd 0) should succeed
+ * We dup it first so we can restore it.
+ */
+static void test_close_stdin(void)
+{
+ int saved_stdin = dup(0);
+
+ if (saved_stdin < 0) {
+ tap_fail("close stdin succeeds", "cannot dup stdin");
+ return;
+ }
+
+ errno = 0;
+ long ret = syscall(__NR_close, 0);
+
+ if (ret == 0) {
+ tap_ok("close stdin (fd 0) succeeds");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected success, got %s",
+ strerror(errno));
+ tap_fail("close stdin (fd 0) succeeds", msg);
+ }
+
+ /* Restore stdin */
+ dup2(saved_stdin, 0);
+ close(saved_stdin);
+}
+
+/*
+ * Test 26: read after close returns EBADF
+ */
+static void test_read_after_close(void)
+{
+ long fd;
+
+ errno = 0;
+ fd = kapi_sys_open("/dev/null", O_RDONLY, 0);
+ if (fd < 0) {
+ tap_fail("read after close returns EBADF",
+ "cannot open /dev/null");
+ return;
+ }
+
+ syscall(__NR_close, (int)fd);
+
+ char buf[16];
+
+ errno = 0;
+ long ret = syscall(__NR_read, (int)fd, buf, sizeof(buf));
+
+ if (ret == -1 && errno == EBADF) {
+ tap_ok("read after close returns EBADF");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected EBADF, got %s",
+ ret >= 0 ? "success" : strerror(errno));
+ tap_fail("read after close returns EBADF", msg);
+ }
+}
+
+/*
+ * Test 27: write with large count
+ * Without KAPI: the kernel clamps count to MAX_RW_COUNT and succeeds.
+ * With KAPI: KAPI validates the buffer against the count and may
+ * return EFAULT/EINVAL since the buffer is smaller than count.
+ * Accept either success or EFAULT/EINVAL.
+ */
+static void test_write_large_count(void)
+{
+ long fd;
+ char buf[64] = "test data";
+
+ errno = 0;
+ fd = kapi_sys_open("/dev/null", O_WRONLY, 0);
+ if (fd < 0) {
+ tap_fail("write with large count handled correctly",
+ "cannot open /dev/null");
+ return;
+ }
+
+ errno = 0;
+ long ret = syscall(__NR_write, (int)fd, buf, (size_t)0x7ffff000UL);
+
+ if (ret > 0) {
+ tap_ok("write with large count succeeds (clamped, no KAPI)");
+ } else if (ret == -1 && (errno == EFAULT || errno == EINVAL)) {
+ tap_ok("write with large count returns EFAULT/EINVAL (KAPI validates buffer)");
+ } else {
+ char msg[64];
+
+ snprintf(msg, sizeof(msg), "expected success or EFAULT, got %s",
+ ret == 0 ? "zero" : strerror(errno));
+ tap_fail("write with large count handled correctly", msg);
+ }
+
+ syscall(__NR_close, (int)fd);
+}
+
+/* ---- Integration tests ---- */
+
+/*
+ * Test 28: full normal syscall path - open, read, write, close
+ * Verify KAPI does not interfere with normal operations.
+ */
+static void test_normal_path(void)
+{
+ long rd_fd, wr_fd;
+ char buf[128];
+ int ok = 1;
+ char reason[128] = "";
+
+ /* Open a readable file */
+ errno = 0;
+ rd_fd = kapi_sys_open("/etc/hostname", O_RDONLY, 0);
+ if (rd_fd < 0) {
+ errno = 0;
+ rd_fd = kapi_sys_open("/etc/passwd", O_RDONLY, 0);
+ }
+ if (rd_fd < 0) {
+ snprintf(reason, sizeof(reason), "open readable file: %s",
+ strerror(errno));
+ ok = 0;
+ }
+
+ /* Read from it */
+ if (ok) {
+ errno = 0;
+ long n = syscall(__NR_read, (int)rd_fd, buf, sizeof(buf));
+
+ if (n < 0) {
+ snprintf(reason, sizeof(reason), "read: %s",
+ strerror(errno));
+ ok = 0;
+ }
+ }
+
+ /* Open /dev/null for writing */
+ wr_fd = -1;
+ if (ok) {
+ errno = 0;
+ wr_fd = kapi_sys_open("/dev/null", O_WRONLY, 0);
+ if (wr_fd < 0) {
+ snprintf(reason, sizeof(reason),
+ "open /dev/null: %s", strerror(errno));
+ ok = 0;
+ }
+ }
+
+ /* Write to /dev/null */
+ if (ok) {
+ errno = 0;
+ long n = syscall(__NR_write, (int)wr_fd, "test", 4);
+
+ if (n != 4) {
+ snprintf(reason, sizeof(reason), "write: %s",
+ n < 0 ? strerror(errno) : "short write");
+ ok = 0;
+ }
+ }
+
+ /* Close both fds */
+ if (rd_fd >= 0) {
+ errno = 0;
+ if (syscall(__NR_close, (int)rd_fd) != 0 && ok) {
+ snprintf(reason, sizeof(reason), "close read fd: %s",
+ strerror(errno));
+ ok = 0;
+ }
+ }
+
+ if (wr_fd >= 0) {
+ errno = 0;
+ if (syscall(__NR_close, (int)wr_fd) != 0 && ok) {
+ snprintf(reason, sizeof(reason), "close write fd: %s",
+ strerror(errno));
+ ok = 0;
+ }
+ }
+
+ if (ok)
+ tap_ok("normal syscall path (open/read/write/close) works");
+ else
+ tap_fail("normal syscall path (open/read/write/close) works",
+ reason);
+}
+
+/*
+ * Test 29: verify dmesg contains KAPI warnings for the invalid tests
+ */
+static void test_dmesg_warnings(void)
+{
+ int kmsg_fd = open("/dev/kmsg", O_RDONLY | O_NONBLOCK);
+
+ if (kmsg_fd < 0) {
+ tap_skip("dmesg contains expected KAPI warnings",
+ "cannot open /dev/kmsg");
+ return;
+ }
+
+ /*
+ * Rewind to the start of kmsg. SEEK_DATA on /dev/kmsg is the
+ * documented way to skip to the first entry still in the ring
+ * buffer. Older kernels (or CONFIG_PRINTK=n builds) may reject
+ * the seek with -EINVAL; in that case we can't reliably audit
+ * past warnings, so skip the test rather than fail it.
+ */
+ if (lseek(kmsg_fd, 0, SEEK_DATA) == (off_t)-1) {
+ tap_skip("dmesg contains expected KAPI warnings",
+ "lseek(SEEK_DATA) not supported on /dev/kmsg");
+ close(kmsg_fd);
+ return;
+ }
+
+ char line[4096];
+ int found_invalid_bits = 0;
+ int found_null = 0;
+ ssize_t n;
+
+ for (;;) {
+ n = read(kmsg_fd, line, sizeof(line) - 1);
+ if (n > 0) {
+ line[n] = '\0';
+ if (strstr(line, "contains invalid bits"))
+ found_invalid_bits++;
+ if (strstr(line, "NULL") && strstr(line, "not allowed"))
+ found_null++;
+ } else if (n == -1 && errno == EPIPE) {
+ /* Ring buffer wrapped, continue reading */
+ continue;
+ } else {
+ /* EAGAIN (no more messages) or other error */
+ break;
+ }
+ }
+
+ close(kmsg_fd);
+
+ if (found_invalid_bits >= 2 && found_null >= 1) {
+ tap_ok("dmesg contains expected KAPI warnings");
+ } else if (found_invalid_bits >= 1 || found_null >= 1) {
+ char msg[128];
+
+ snprintf(msg, sizeof(msg),
+ "partial: invalid_bits=%d null=%d",
+ found_invalid_bits, found_null);
+ tap_ok(msg);
+ } else {
+ tap_fail("dmesg KAPI warnings",
+ "no KAPI warnings found in dmesg");
+ }
+}
+
+int main(void)
+{
+ ksft_print_header();
+ ksft_set_plan(NUM_TESTS);
+
+ /* Valid operations (1-4) */
+ int fd = test_open_valid();
+
+ if (fd >= 0)
+ test_read_valid(fd);
+ else
+ tap_fail("read from valid fd", "no fd from open");
+
+ test_write_valid();
+
+ if (fd >= 0)
+ test_close_valid(fd);
+ else
+ tap_fail("close valid fd", "no fd from open");
+
+ /* KAPI parameter rejection (5-8) */
+ test_open_invalid_flags();
+ test_open_invalid_mode();
+ test_open_null_path();
+ test_open_flag_bit30();
+
+ /* Boundary conditions and error paths (9-20) */
+ test_read_bad_fd();
+ test_read_zero_count();
+ test_write_zero_count();
+ test_open_long_path();
+ test_read_unmapped_buf();
+ test_write_unmapped_buf();
+ test_close_already_closed();
+ test_open_valid_cloexec();
+ test_write_zero_devnull();
+ test_read_writeonly_fd();
+ test_write_readonly_fd();
+ test_close_fd_9999();
+
+ /* Pipe and lifecycle tests (21-27) */
+ test_read_closed_pipe();
+ test_write_closed_pipe();
+ test_open_directory_on_file();
+ test_open_nonexistent();
+ test_close_stdin();
+ test_read_after_close();
+ test_write_large_count();
+
+ /* Integration (28-29) */
+ test_normal_path();
+ test_dmesg_warnings();
+
+ ksft_finished();
+ return 0;
+}
--
2.53.0
^ permalink raw reply related
* [PATCH v11 00/15] Exposing case folding behavior
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Darrick J. Wong, Roland Mainz, Steve French
Following on from:
https://lore.kernel.org/linux-nfs/20251021-zypressen-bazillus-545a44af57fd@brauner/T/#m0ba197d75b7921d994cf284f3cef3a62abb11aaa
I'm attempting to implement enough support in the Linux VFS to
enable file services like NFSD and ksmbd (and user space
equivalents) to provide the actual status of case folding support
in local file systems. The default behavior for local file systems
not explicitly supported in this series is to reflect the usual
POSIX behaviors:
case-insensitive = false
case-nonpreserving = false
The case-insensitivity and case-nonpreserving booleans can be
consumed immediately by NFSD. These two attributes have been part of
the NFSv3 and NFSv4 protocols for decades, in order to support NFS
client implementations on non-POSIX systems.
Support for user space file servers is why this series exposes case
folding information via a user-space API. I don't know of any other
category of user-space application that requires access to case
folding info.
The Linux NFS community has a growing interest in supporting NFS
clients on Windows and MacOS platforms, where file name behavior does
not align with traditional POSIX semantics.
One example of a Windows-based NFS client is [1]. This client
implementation explicitly requires servers to report
FATTR4_WORD0_CASE_INSENSITIVE = TRUE for proper operation, a hard
requirement for Windows client interoperability because Windows
applications expect case-insensitive behavior. When an NFS client
knows the server is case-insensitive, it can avoid issuing multiple
LOOKUP/READDIR requests to search for case variants, and applications
like Win32 programs work correctly without manual workarounds or
code changes.
Even the Linux client can take advantage of this information. Trond
merged patches 4 years ago [2] that introduce support for case
insensitivity, in support of the Hammerspace NFS server. In
particular, when a client detects a case-insensitive NFS share,
negative dentry caching must be disabled (a lookup for "FILE.TXT"
failing shouldn't cache a negative entry when "file.txt" exists)
and directory change invalidation must clear all cached case-folded
file name variants.
Hammerspace servers and several other NFS server implementations
operate in multi-protocol environments, where a single file service
instance caters to both NFS and SMB clients. In those cases, things
work more smoothly for everyone when the NFS client can see and adapt
to the case folding behavior that SMB users rely on and expect. NFSD
needs to support the case-insensitivity and case-nonpreserving
booleans properly in order to participate as a first-class citizen
in such environments.
[1] https://github.com/kofemann/ms-nfs41-client
[2] https://patchwork.kernel.org/project/linux-nfs/cover/20211217203658.439352-1-trondmy@kernel.org/
---
Changes since v10:
- cifs: Source case-handling flags from the server's cached
FS_ATTRIBUTE_INFORMATION reply instead of the nocase mount
option, with a nocase fallback when the reply is absent
- Address findings from sashiko(gemini-3) and gpt-5.5:
- nfs: Skip pathconf case bits on NFSv4 (set via FATTR4_CASE_*
instead)
- xfs: Hide FS_CASEFOLD_FL from the legacy flags view so
chattr round-trips do not hit the setflags whitelist
- ext4, f2fs: Drop redundant fileattr_get patches; the
FS_CASEFOLD_FL translation in fileattr_fill_flags() already
reports FS_XFLAG_CASEFOLD for casefolded directories
- nfsd: Report FATTR4_HOMOGENEOUS = FALSE when the exported
filesystem has a Unicode encoding, since per-directory
casefold makes the fs-scoped case attributes inhomogeneous
- nfsd: Document in nfsd_get_case_info() why -ENOIOCTLCMD and
-ENOTTY are swallowed while other errors propagate
- fat: Honor vfat 'check=strict' when reporting FS_XFLAG_CASEFOLD
- Set FS_CASEFOLD_FL so FS_IOC_GETFLAGS reflects case-insensitive
mount
- isofs: Register fileattr_get on regular file and symlink inodes,
not just directories
- nfsd: Query NFSv4 FATTR4_CASE_* from the parent directory for
non-directory objects, since casefold lives on the directory
Changes since v9:
- nfs: always probe PATHCONF for case caps. Default to case-
preserving when the server does not report case_preserving
- nfsd, ksmbd: tolerate -ENOTTY from vfs_fileattr_get() so
overlayfs exports on backing filesystems without fileattr_get
do not fail the RPC
- xfs: map FS_XFLAG_CASEFOLD inside xfs_ip2xflags() so BULKSTAT
and FS_IOC_FSGETXATTR report the flag consistently
- vboxsf: reject a short host reply to SHFL_INFO_VOLUME before
trusting volinfo.properties.case_sensitive
Changes since v8:
- Rebase on v7.0-rc1
Changes since v7:
- Split file_attr initialization changes into a separate patch
Changes since v6:
- Remove the memset from vfs_fileattr_get
Changes since v5:
- Finish the conversion to FS_XFLAGs
- NFSv4 GETATTR now clears the attr mask bit if nfsd_get_case_info()
fails
Changes since v4:
- Observe the MSDOS "nocase" mount option
- Define new FS_XFLAGs for the user API
Changes since v3:
- Change fa->case_preserving to fa_case_nonpreserving
- VFAT is case preserving
- Make new fields available to user space
Changes since v2:
- Remove unicode labels
- Replace vfs_get_case_info
- Add support for several more local file system implementations
- Add support for in-kernel SMB server
Changes since RFC:
- Use file_getattr instead of statx
- Postpone exposing Unicode version until later
- Support NTFS and ext4 in addition to FAT
- Support NFSv4 fattr4 in addition to NFSv3 PATHCONF
---
Changes in v11:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v10: https://patch.msgid.link/20260423-case-sensitivity-v10-0-c385d674a6cf@oracle.com
---
Chuck Lever (15):
fs: Move file_kattr initialization to callers
fs: Add case sensitivity flags to file_kattr
fat: Implement fileattr_get for case sensitivity
exfat: Implement fileattr_get for case sensitivity
ntfs3: Implement fileattr_get for case sensitivity
hfs: Implement fileattr_get for case sensitivity
hfsplus: Report case sensitivity in fileattr_get
xfs: Report case sensitivity in fileattr_get
cifs: Implement fileattr_get for case sensitivity
nfs: Implement fileattr_get for case sensitivity
vboxsf: Implement fileattr_get for case sensitivity
isofs: Implement fileattr_get for case sensitivity
nfsd: Report export case-folding via NFSv3 PATHCONF
nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
fs/exfat/exfat_fs.h | 2 ++
fs/exfat/file.c | 18 ++++++++++++--
fs/exfat/namei.c | 1 +
fs/fat/fat.h | 3 +++
fs/fat/file.c | 32 ++++++++++++++++++++++++
fs/fat/namei_msdos.c | 1 +
fs/fat/namei_vfat.c | 1 +
fs/file_attr.c | 16 ++++++------
fs/hfs/dir.c | 1 +
fs/hfs/hfs_fs.h | 2 ++
fs/hfs/inode.c | 14 +++++++++++
fs/hfsplus/inode.c | 12 +++++++++
fs/isofs/dir.c | 24 ++++++++++++++++++
fs/isofs/inode.c | 3 ++-
fs/isofs/isofs.h | 5 ++++
fs/nfs/client.c | 22 +++++++++++++----
fs/nfs/inode.c | 23 ++++++++++++++++++
fs/nfs/internal.h | 3 +++
fs/nfs/nfs3proc.c | 2 ++
fs/nfs/nfs3xdr.c | 7 ++++--
fs/nfs/nfs4proc.c | 7 ++++--
fs/nfs/proc.c | 3 +++
fs/nfs/symlink.c | 3 +++
fs/nfsd/nfs3proc.c | 18 ++++++++------
fs/nfsd/nfs4xdr.c | 55 +++++++++++++++++++++++++++++++++++++++---
fs/nfsd/vfs.c | 43 +++++++++++++++++++++++++++++++++
fs/nfsd/vfs.h | 3 +++
fs/ntfs3/file.c | 25 +++++++++++++++++++
fs/ntfs3/inode.c | 1 +
fs/ntfs3/namei.c | 2 ++
fs/ntfs3/ntfs_fs.h | 1 +
fs/smb/client/cifsfs.c | 42 ++++++++++++++++++++++++++++++++
fs/smb/server/smb2pdu.c | 30 ++++++++++++++++++-----
fs/vboxsf/dir.c | 1 +
fs/vboxsf/file.c | 6 +++--
fs/vboxsf/super.c | 7 ++++++
fs/vboxsf/utils.c | 30 +++++++++++++++++++++++
fs/vboxsf/vfsmod.h | 6 +++++
fs/xfs/libxfs/xfs_inode_util.c | 2 ++
fs/xfs/xfs_ioctl.c | 9 ++++++-
include/linux/fileattr.h | 3 ++-
include/linux/nfs_fs_sb.h | 2 +-
include/linux/nfs_xdr.h | 2 ++
include/uapi/linux/fs.h | 7 ++++++
44 files changed, 458 insertions(+), 42 deletions(-)
---
base-commit: 6596a02b207886e9e00bb0161c7fd59fea53c081
change-id: 20260422-case-sensitivity-5cbffc8f1558
Best regards,
--
Chuck Lever
^ permalink raw reply
* [PATCH v11 01/15] fs: Move file_kattr initialization to callers
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Darrick J. Wong, Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
fileattr_fill_xflags() and fileattr_fill_flags() memset the
entire file_kattr struct before populating select fields, so
callers cannot pre-set fields in fa->fsx_xflags without having
their values clobbered. Darrick Wong noted that a function
named "fill_xflags" touching more than xflags forces callers
to know implementation details beyond its apparent scope.
Drop the memset from both fill functions and initialize at the
entry points instead: ioctl_setflags(), ioctl_fssetxattr(),
the file_setattr() syscall, and xfs_ioc_fsgetxattra() now
declare fa with an aggregate initializer. ioctl_getflags(),
ioctl_fsgetxattr(), and the file_getattr() syscall already
aggregate-initialize fa to pass flags_valid/fsx_valid hints
into vfs_fileattr_get().
Subsequent patches rely on this so that ->fileattr_get()
handlers can set case-sensitivity flags (FS_XFLAG_CASEFOLD,
FS_XFLAG_CASENONPRESERVING) in fa->fsx_xflags before the fill
functions run.
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/file_attr.c | 12 ++++--------
fs/xfs/xfs_ioctl.c | 2 +-
2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/fs/file_attr.c b/fs/file_attr.c
index da983e105d70..f429da66a317 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -15,12 +15,10 @@
* @fa: fileattr pointer
* @xflags: FS_XFLAG_* flags
*
- * Set ->fsx_xflags, ->fsx_valid and ->flags (translated xflags). All
- * other fields are zeroed.
+ * Set ->fsx_xflags, ->fsx_valid and ->flags (translated xflags).
*/
void fileattr_fill_xflags(struct file_kattr *fa, u32 xflags)
{
- memset(fa, 0, sizeof(*fa));
fa->fsx_valid = true;
fa->fsx_xflags = xflags;
if (fa->fsx_xflags & FS_XFLAG_IMMUTABLE)
@@ -48,11 +46,9 @@ EXPORT_SYMBOL(fileattr_fill_xflags);
* @flags: FS_*_FL flags
*
* Set ->flags, ->flags_valid and ->fsx_xflags (translated flags).
- * All other fields are zeroed.
*/
void fileattr_fill_flags(struct file_kattr *fa, u32 flags)
{
- memset(fa, 0, sizeof(*fa));
fa->flags_valid = true;
fa->flags = flags;
if (fa->flags & FS_SYNC_FL)
@@ -325,7 +321,7 @@ int ioctl_setflags(struct file *file, unsigned int __user *argp)
{
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct dentry *dentry = file->f_path.dentry;
- struct file_kattr fa;
+ struct file_kattr fa = {};
unsigned int flags;
int err;
@@ -357,7 +353,7 @@ int ioctl_fssetxattr(struct file *file, void __user *argp)
{
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct dentry *dentry = file->f_path.dentry;
- struct file_kattr fa;
+ struct file_kattr fa = {};
int err;
err = copy_fsxattr_from_user(&fa, argp);
@@ -431,7 +427,7 @@ SYSCALL_DEFINE5(file_setattr, int, dfd, const char __user *, filename,
struct path filepath __free(path_put) = {};
unsigned int lookup_flags = 0;
struct file_attr fattr;
- struct file_kattr fa;
+ struct file_kattr fa = {};
int error;
BUILD_BUG_ON(sizeof(struct file_attr) < FILE_ATTR_SIZE_VER0);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 46e234863644..ed9b4846c05f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -517,7 +517,7 @@ xfs_ioc_fsgetxattra(
xfs_inode_t *ip,
void __user *arg)
{
- struct file_kattr fa;
+ struct file_kattr fa = {};
xfs_ilock(ip, XFS_ILOCK_SHARED);
xfs_fill_fsxattr(ip, XFS_ATTR_FORK, &fa);
--
2.53.0
^ permalink raw reply related
* [PATCH v11 02/15] fs: Add case sensitivity flags to file_kattr
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Darrick J. Wong, Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Enable upper layers such as NFSD to retrieve case sensitivity
information from file systems by adding FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING flags.
Filesystems report case-insensitive or case-nonpreserving behavior
by setting these flags directly in fa->fsx_xflags. The default
(flags unset) indicates POSIX semantics: case-sensitive and
case-preserving. Both flags are added to FS_XFLAG_RDONLY_MASK so
FS_IOC_FSSETXATTR silently strips them, keeping the new xflags
strictly a reporting interface. Callers that want to toggle
casefolding continue to use FS_IOC_SETFLAGS with FS_CASEFOLD_FL,
the established UAPI on filesystems that support the operation
(ext4 and f2fs on empty directories).
Case sensitivity information is exported to userspace via the
fa_xflags field in the FS_IOC_FSGETXATTR ioctl and file_getattr()
system call.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/file_attr.c | 4 ++++
include/linux/fileattr.h | 3 ++-
include/uapi/linux/fs.h | 7 +++++++
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/file_attr.c b/fs/file_attr.c
index f429da66a317..bfb00d256dd5 100644
--- a/fs/file_attr.c
+++ b/fs/file_attr.c
@@ -37,6 +37,8 @@ void fileattr_fill_xflags(struct file_kattr *fa, u32 xflags)
fa->flags |= FS_PROJINHERIT_FL;
if (fa->fsx_xflags & FS_XFLAG_VERITY)
fa->flags |= FS_VERITY_FL;
+ if (fa->fsx_xflags & FS_XFLAG_CASEFOLD)
+ fa->flags |= FS_CASEFOLD_FL;
}
EXPORT_SYMBOL(fileattr_fill_xflags);
@@ -67,6 +69,8 @@ void fileattr_fill_flags(struct file_kattr *fa, u32 flags)
fa->fsx_xflags |= FS_XFLAG_PROJINHERIT;
if (fa->flags & FS_VERITY_FL)
fa->fsx_xflags |= FS_XFLAG_VERITY;
+ if (fa->flags & FS_CASEFOLD_FL)
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
}
EXPORT_SYMBOL(fileattr_fill_flags);
diff --git a/include/linux/fileattr.h b/include/linux/fileattr.h
index 3780904a63a6..58044b598016 100644
--- a/include/linux/fileattr.h
+++ b/include/linux/fileattr.h
@@ -16,7 +16,8 @@
/* Read-only inode flags */
#define FS_XFLAG_RDONLY_MASK \
- (FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | FS_XFLAG_VERITY)
+ (FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | FS_XFLAG_VERITY | \
+ FS_XFLAG_CASEFOLD | FS_XFLAG_CASENONPRESERVING)
/* Flags to indicate valid value of fsx_ fields */
#define FS_XFLAG_VALUES_MASK \
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 13f71202845e..2ea4c81df08f 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -254,6 +254,13 @@ struct file_attr {
#define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
#define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size allocator hint */
#define FS_XFLAG_VERITY 0x00020000 /* fs-verity enabled */
+/*
+ * Case handling flags (read-only, cannot be set via ioctl).
+ * Default (neither set) indicates POSIX semantics: case-sensitive
+ * lookups and case-preserving storage.
+ */
+#define FS_XFLAG_CASEFOLD 0x00040000 /* case-insensitive lookups */
+#define FS_XFLAG_CASENONPRESERVING 0x00080000 /* case not preserved */
#define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
/* the read-only stuff doesn't really belong here, but any other place is
--
2.53.0
^ permalink raw reply related
* [PATCH v11 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Report FAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
and FS_XFLAG_CASENONPRESERVING flags. FAT filesystems are
case-insensitive by default.
MSDOS supports a 'nocase' mount option that enables case-sensitive
behavior; check this option when reporting case sensitivity.
VFAT long filename entries preserve case; without VFAT, only
uppercased 8.3 short names are stored. MSDOS with 'nocase' also
preserves case since the name-formatting code skips upcasing when
'nocase' is set. Check both options when reporting case preservation.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/fat/fat.h | 3 +++
fs/fat/file.c | 32 ++++++++++++++++++++++++++++++++
fs/fat/namei_msdos.c | 1 +
fs/fat/namei_vfat.c | 1 +
4 files changed, 37 insertions(+)
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 5a58f0bf8ce8..99ed9228a677 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -10,6 +10,8 @@
#include <linux/fs_context.h>
#include <linux/fs_parser.h>
+struct file_kattr;
+
/*
* vfat shortname flags
*/
@@ -408,6 +410,7 @@ extern void fat_truncate_blocks(struct inode *inode, loff_t offset);
extern int fat_getattr(struct mnt_idmap *idmap,
const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int flags);
+int fat_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
extern int fat_file_fsync(struct file *file, loff_t start, loff_t end,
int datasync);
diff --git a/fs/fat/file.c b/fs/fat/file.c
index becccdd2e501..5f0178fc2ede 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -17,6 +17,7 @@
#include <linux/fsnotify.h>
#include <linux/security.h>
#include <linux/falloc.h>
+#include <linux/fileattr.h>
#include "fat.h"
static long fat_fallocate(struct file *file, int mode,
@@ -398,6 +399,36 @@ void fat_truncate_blocks(struct inode *inode, loff_t offset)
fat_flush_inodes(inode->i_sb, inode, NULL);
}
+int fat_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ struct msdos_sb_info *sbi = MSDOS_SB(dentry->d_sb);
+ bool case_sensitive;
+
+ /*
+ * FAT filesystems are case-insensitive by default. VFAT
+ * becomes case-sensitive when mounted with 'check=strict',
+ * which installs vfat_dentry_ops. MSDOS has no such option;
+ * its 'nocase' mount option selects case-sensitive matching.
+ *
+ * VFAT long filename entries preserve case. Without VFAT, only
+ * uppercased 8.3 short names are stored. MSDOS with 'nocase'
+ * also preserves case.
+ */
+ if (sbi->options.isvfat)
+ case_sensitive = sbi->options.name_check == 's';
+ else
+ case_sensitive = sbi->options.nocase;
+
+ if (!case_sensitive) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ if (!sbi->options.isvfat)
+ fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fat_fileattr_get);
+
int fat_getattr(struct mnt_idmap *idmap, const struct path *path,
struct kstat *stat, u32 request_mask, unsigned int flags)
{
@@ -575,5 +606,6 @@ EXPORT_SYMBOL_GPL(fat_setattr);
const struct inode_operations fat_file_inode_operations = {
.setattr = fat_setattr,
.getattr = fat_getattr,
+ .fileattr_get = fat_fileattr_get,
.update_time = fat_update_time,
};
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index 4cc65f330fb7..0fd2971ad4b1 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -644,6 +644,7 @@ static const struct inode_operations msdos_dir_inode_operations = {
.rename = msdos_rename,
.setattr = fat_setattr,
.getattr = fat_getattr,
+ .fileattr_get = fat_fileattr_get,
.update_time = fat_update_time,
};
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 918b3756674c..e909447873e3 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -1185,6 +1185,7 @@ static const struct inode_operations vfat_dir_inode_operations = {
.rename = vfat_rename2,
.setattr = fat_setattr,
.getattr = fat_getattr,
+ .fileattr_get = fat_fileattr_get,
.update_time = fat_update_time,
};
--
2.53.0
^ permalink raw reply related
* [PATCH v11 04/15] exfat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Report exFAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. exFAT is always case-insensitive (using an upcase table for
comparison) and always preserves case at rest.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/exfat/exfat_fs.h | 2 ++
fs/exfat/file.c | 18 ++++++++++++++++--
fs/exfat/namei.c | 1 +
3 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 89ef5368277f..aff4dcd4e75a 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -496,6 +496,8 @@ int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
struct kstat *stat, unsigned int request_mask,
unsigned int query_flags);
+struct file_kattr;
+int exfat_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
int exfat_file_fsync(struct file *file, loff_t start, loff_t end, int datasync);
long exfat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
long exfat_compat_ioctl(struct file *filp, unsigned int cmd,
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 354bdcfe4abc..91e5511945d1 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -14,6 +14,7 @@
#include <linux/writeback.h>
#include <linux/filelock.h>
#include <linux/falloc.h>
+#include <linux/fileattr.h>
#include "exfat_raw.h"
#include "exfat_fs.h"
@@ -323,6 +324,18 @@ int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
return 0;
}
+int exfat_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ /*
+ * exFAT compares filenames through an upcase table, so lookup
+ * is always case-insensitive. Long names are stored in UTF-16
+ * with case intact; CASENONPRESERVING stays clear.
+ */
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ return 0;
+}
+
int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
struct iattr *attr)
{
@@ -817,6 +830,7 @@ const struct file_operations exfat_file_operations = {
};
const struct inode_operations exfat_file_inode_operations = {
- .setattr = exfat_setattr,
- .getattr = exfat_getattr,
+ .setattr = exfat_setattr,
+ .getattr = exfat_getattr,
+ .fileattr_get = exfat_fileattr_get,
};
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 2c5636634b4a..94002e43db08 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -1311,4 +1311,5 @@ const struct inode_operations exfat_dir_inode_operations = {
.rename = exfat_rename,
.setattr = exfat_setattr,
.getattr = exfat_getattr,
+ .fileattr_get = exfat_fileattr_get,
};
--
2.53.0
^ permalink raw reply related
* [PATCH v11 05/15] ntfs3: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Report NTFS case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. NTFS always preserves case at rest.
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/ntfs3/file.c | 25 +++++++++++++++++++++++++
fs/ntfs3/inode.c | 1 +
fs/ntfs3/namei.c | 2 ++
fs/ntfs3/ntfs_fs.h | 1 +
4 files changed, 29 insertions(+)
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index b041639ab406..447ea0f9b9d5 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -180,6 +180,30 @@ long ntfs_compat_ioctl(struct file *filp, u32 cmd, unsigned long arg)
}
#endif
+/*
+ * ntfs_fileattr_get - inode_operations::fileattr_get
+ */
+int ntfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ struct inode *inode = d_inode(dentry);
+ struct ntfs_sb_info *sbi = inode->i_sb->s_fs_info;
+
+ /* Avoid any operation if inode is bad. */
+ if (unlikely(is_bad_ni(ntfs_i(inode))))
+ return -EINVAL;
+
+ /*
+ * NTFS preserves case (the default). Case sensitivity depends on
+ * mount options: with "nocase", NTFS is case-insensitive;
+ * otherwise it is case-sensitive.
+ */
+ if (sbi->options->nocase) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
+ return 0;
+}
+
/*
* ntfs_getattr - inode_operations::getattr
*/
@@ -1547,6 +1571,7 @@ const struct inode_operations ntfs_file_inode_operations = {
.get_acl = ntfs_get_acl,
.set_acl = ntfs_set_acl,
.fiemap = ntfs_fiemap,
+ .fileattr_get = ntfs_fileattr_get,
};
const struct file_operations ntfs_file_operations = {
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index 42af1abe17f8..a5ff04c2efd3 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -2095,6 +2095,7 @@ const struct inode_operations ntfs_link_inode_operations = {
.get_link = ntfs_get_link,
.setattr = ntfs_setattr,
.listxattr = ntfs_listxattr,
+ .fileattr_get = ntfs_fileattr_get,
};
const struct address_space_operations ntfs_aops = {
diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c
index b2af8f695e60..eb241d7796ba 100644
--- a/fs/ntfs3/namei.c
+++ b/fs/ntfs3/namei.c
@@ -518,6 +518,7 @@ const struct inode_operations ntfs_dir_inode_operations = {
.getattr = ntfs_getattr,
.listxattr = ntfs_listxattr,
.fiemap = ntfs_fiemap,
+ .fileattr_get = ntfs_fileattr_get,
};
const struct inode_operations ntfs_special_inode_operations = {
@@ -526,6 +527,7 @@ const struct inode_operations ntfs_special_inode_operations = {
.listxattr = ntfs_listxattr,
.get_acl = ntfs_get_acl,
.set_acl = ntfs_set_acl,
+ .fileattr_get = ntfs_fileattr_get,
};
const struct dentry_operations ntfs_dentry_ops = {
diff --git a/fs/ntfs3/ntfs_fs.h b/fs/ntfs3/ntfs_fs.h
index bbf3b6a1dcbe..41db22d652c4 100644
--- a/fs/ntfs3/ntfs_fs.h
+++ b/fs/ntfs3/ntfs_fs.h
@@ -529,6 +529,7 @@ bool dir_is_empty(struct inode *dir);
extern const struct file_operations ntfs_dir_operations;
/* Globals from file.c */
+int ntfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
struct kstat *stat, u32 request_mask, u32 flags);
int ntfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
--
2.53.0
^ permalink raw reply related
* [PATCH v11 06/15] hfs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Report HFS case sensitivity behavior via the FS_XFLAG_CASEFOLD
flag. HFS is always case-insensitive (using Mac OS Roman case
folding) and always preserves case at rest.
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/hfs/dir.c | 1 +
fs/hfs/hfs_fs.h | 2 ++
fs/hfs/inode.c | 14 ++++++++++++++
3 files changed, 17 insertions(+)
diff --git a/fs/hfs/dir.c b/fs/hfs/dir.c
index f5e7efe924e7..c4c6e1623f55 100644
--- a/fs/hfs/dir.c
+++ b/fs/hfs/dir.c
@@ -328,4 +328,5 @@ const struct inode_operations hfs_dir_inode_operations = {
.rmdir = hfs_remove,
.rename = hfs_rename,
.setattr = hfs_inode_setattr,
+ .fileattr_get = hfs_fileattr_get,
};
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index ac0e83f77a0f..1b23448c9a48 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -177,6 +177,8 @@ extern int hfs_get_block(struct inode *inode, sector_t block,
extern const struct address_space_operations hfs_aops;
extern const struct address_space_operations hfs_btree_aops;
+struct file_kattr;
+int hfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
int hfs_write_begin(const struct kiocb *iocb, struct address_space *mapping,
loff_t pos, unsigned int len, struct folio **foliop,
void **fsdata);
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index 89b33a9d46d5..f41cc261684d 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -18,6 +18,7 @@
#include <linux/uio.h>
#include <linux/xattr.h>
#include <linux/blkdev.h>
+#include <linux/fileattr.h>
#include "hfs_fs.h"
#include "btree.h"
@@ -699,6 +700,18 @@ static int hfs_file_fsync(struct file *filp, loff_t start, loff_t end,
return ret;
}
+int hfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ /*
+ * HFS compares filenames using Mac OS Roman case folding, so
+ * lookup is always case-insensitive. Names are stored on disk
+ * with case intact; CASENONPRESERVING stays clear.
+ */
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ return 0;
+}
+
static const struct file_operations hfs_file_operations = {
.llseek = generic_file_llseek,
.read_iter = generic_file_read_iter,
@@ -715,4 +728,5 @@ static const struct inode_operations hfs_file_inode_operations = {
.lookup = hfs_file_lookup,
.setattr = hfs_inode_setattr,
.listxattr = generic_listxattr,
+ .fileattr_get = hfs_fileattr_get,
};
--
2.53.0
^ permalink raw reply related
* [PATCH v11 07/15] hfsplus: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Add case sensitivity reporting to the existing hfsplus_fileattr_get()
function via the FS_XFLAG_CASEFOLD flag. HFS+ always preserves case
at rest.
Case sensitivity depends on how the volume was formatted: HFSX
volumes may be either case-sensitive or case-insensitive, indicated
by the HFSPLUS_SB_CASEFOLD superblock flag.
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/hfsplus/inode.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index d05891ec492e..38b6eb659a79 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -740,6 +740,7 @@ int hfsplus_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
{
struct inode *inode = d_inode(dentry);
struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
unsigned int flags = 0;
if (inode->i_flags & S_IMMUTABLE)
@@ -751,6 +752,17 @@ int hfsplus_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
fileattr_fill_flags(fa, flags);
+ /*
+ * HFS+ always preserves case at rest. Standard HFS+ volumes
+ * are case-insensitive; HFSX volumes may be either
+ * case-sensitive or case-insensitive depending on how they
+ * were formatted. HFSPLUS_SB_CASEFOLD is set in both
+ * case-insensitive variants.
+ */
+ if (test_bit(HFSPLUS_SB_CASEFOLD, &sbi->flags)) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
return 0;
}
--
2.53.0
^ permalink raw reply related
* [PATCH v11 08/15] xfs: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Upper layers such as NFSD need to query whether a filesystem
is case-sensitive. Add FS_XFLAG_CASEFOLD to xfs_ip2xflags()
when the filesystem is formatted with the ASCIICI feature
flag. This serves both FS_IOC_FSGETXATTR (via xfs_fill_fsxattr() in
xfs_fileattr_get()) and XFS_IOC_BULKSTAT (which populates bs_xflags
directly from xfs_ip2xflags()), so bulkstat consumers and per-inode
queries see a consistent view of the filesystem's case-folding
behavior.
XFS always preserves case. XFS is case-sensitive by default, but
supports ASCII case-insensitive lookups when formatted with the
ASCIICI feature flag.
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/xfs/libxfs/xfs_inode_util.c | 2 ++
fs/xfs/xfs_ioctl.c | 7 +++++++
2 files changed, 9 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
index 551fa51befb6..82be54b6f8d3 100644
--- a/fs/xfs/libxfs/xfs_inode_util.c
+++ b/fs/xfs/libxfs/xfs_inode_util.c
@@ -130,6 +130,8 @@ xfs_ip2xflags(
if (xfs_inode_has_attr_fork(ip))
flags |= FS_XFLAG_HASATTR;
+ if (xfs_has_asciici(ip->i_mount))
+ flags |= FS_XFLAG_CASEFOLD;
return flags;
}
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index ed9b4846c05f..5a58fb0bad2b 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -472,6 +472,13 @@ xfs_fill_fsxattr(
fileattr_fill_xflags(fa, xfs_ip2xflags(ip));
+ /*
+ * FS_XFLAG_CASEFOLD is read-only; hide it from the legacy
+ * flags view so chattr's RMW cycle does not pass it back to
+ * xfs_fileattr_set().
+ */
+ fa->flags &= ~FS_CASEFOLD_FL;
+
if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) {
fa->fsx_extsize = XFS_FSB_TO_B(mp, ip->i_extsize);
} else if (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) {
--
2.53.0
^ permalink raw reply related
* [PATCH v11 09/15] cifs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Steve French, Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Upper layers such as NFSD need a way to query whether a filesystem
handles filenames in a case-sensitive manner. Report CIFS/SMB case
handling behavior via FS_XFLAG_CASEFOLD and
FS_XFLAG_CASENONPRESERVING.
The authoritative source is the server itself: at mount time CIFS
issues QueryFSInfo(FS_ATTRIBUTE_INFORMATION) and caches the reply
on the tcon. That reply carries FILE_CASE_SENSITIVE_SEARCH and
FILE_CASE_PRESERVED_NAMES, which reflect whatever case handling
the share actually implements after SMB3.1.1 POSIX extensions
negotiation. Translating those two bits into the VFS flags lets
cifs_fileattr_get report what the server advertises rather than
what the client was asked to pretend.
QueryFSInfo is best-effort; the mount completes even if the server
does not answer. MaxPathNameComponentLength is zero in that case
and is used as the "no reply received" sentinel. When no reply is
available, fall back to the nocase mount option so that the reported
behavior agrees with the dentry comparison operations installed on
the superblock.
The callback is registered in all three inode_operations structures
(directory, file, and symlink) to ensure consistent reporting across
all inode types.
Registering fileattr_get routes FS_IOC_GETFLAGS through
vfs_fileattr_get() and short-circuits the syscall's fallback to
cifs_ioctl(). That fallback invoked CIFSGetExtAttr() under
CONFIG_CIFS_POSIX and CONFIG_CIFS_ALLOW_INSECURE_LEGACY on servers
advertising CIFS_UNIX_EXTATTR_CAP, surfacing the SMB1 Unix-extension
immutable, append, and nodump bits. cifs_fileattr_get carries over
only FS_COMPR_FL from cached cifsAttrs; the SMB1 extattr fetch is
not reproduced. SMB1 is deprecated, and acquiring a netfid from
within a dentry-only callback is not worth preserving a path tied
to an insecure legacy dialect.
Acked-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/smb/client/cifsfs.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 2025739f070a..d71755b59b5b 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -30,6 +30,7 @@
#include <linux/xattr.h>
#include <linux/mm.h>
#include <linux/key-type.h>
+#include <linux/fileattr.h>
#include <uapi/linux/magic.h>
#include <net/ipv6.h>
#include "cifsfs.h"
@@ -1199,6 +1200,44 @@ struct file_system_type smb3_fs_type = {
MODULE_ALIAS_FS("smb3");
MODULE_ALIAS("smb3");
+static int cifs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb);
+ struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
+ u32 attrs = le32_to_cpu(tcon->fsAttrInfo.Attributes);
+
+ /* Preserve FS_COMPR_FL previously reported by cifs_ioctl(). */
+ if (CIFS_I(d_inode(dentry))->cifsAttrs & ATTR_COMPRESSED)
+ fa->flags |= FS_COMPR_FL;
+
+ /*
+ * The server's FS_ATTRIBUTE_INFORMATION response, cached on
+ * the tcon at mount, reflects the share's case-handling
+ * semantics after any POSIX extensions negotiation. Prefer
+ * it over the client-local nocase mount option, which only
+ * governs dentry comparison on this superblock.
+ *
+ * QueryFSInfo is best-effort at mount; when it did not
+ * populate fsAttrInfo, MaxPathNameComponentLength remains
+ * zero. In that case fall back to nocase so the reporting
+ * matches the comparison behavior installed on the sb.
+ */
+ if (le32_to_cpu(tcon->fsAttrInfo.MaxPathNameComponentLength) == 0) {
+ if (tcon->nocase) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
+ return 0;
+ }
+ if (!(attrs & FILE_CASE_SENSITIVE_SEARCH)) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
+ if (!(attrs & FILE_CASE_PRESERVED_NAMES))
+ fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+ return 0;
+}
+
const struct inode_operations cifs_dir_inode_ops = {
.create = cifs_create,
.atomic_open = cifs_atomic_open,
@@ -1217,6 +1256,7 @@ const struct inode_operations cifs_dir_inode_ops = {
.listxattr = cifs_listxattr,
.get_acl = cifs_get_acl,
.set_acl = cifs_set_acl,
+ .fileattr_get = cifs_fileattr_get,
};
const struct inode_operations cifs_file_inode_ops = {
@@ -1227,6 +1267,7 @@ const struct inode_operations cifs_file_inode_ops = {
.fiemap = cifs_fiemap,
.get_acl = cifs_get_acl,
.set_acl = cifs_set_acl,
+ .fileattr_get = cifs_fileattr_get,
};
const char *cifs_get_link(struct dentry *dentry, struct inode *inode,
@@ -1261,6 +1302,7 @@ const struct inode_operations cifs_symlink_inode_ops = {
.setattr = cifs_setattr,
.permission = cifs_permission,
.listxattr = cifs_listxattr,
+ .fileattr_get = cifs_fileattr_get,
};
/*
--
2.53.0
^ permalink raw reply related
* [PATCH v11 10/15] nfs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
An NFS server re-exporting an NFS mount point needs to report
the case sensitivity behavior of the underlying filesystem to
its clients. NFSD's attribute encoder obtains that information
by calling vfs_fileattr_get() on the lower filesystem, so the
NFS client must implement fileattr_get to surface what it
learned from its own server.
The NFS client already retrieves case sensitivity information
from servers during mount via PATHCONF (NFSv3) or the
FATTR4_CASE_INSENSITIVE/FATTR4_CASE_PRESERVING attributes
(NFSv4). Expose this information through fileattr_get by
reporting the FS_XFLAG_CASEFOLD and FS_XFLAG_CASENONPRESERVING
flags. NFSv2 lacks PATHCONF support, so mounts using that protocol
version default to standard POSIX behavior: case-sensitive and
case-preserving.
PATHCONF is now invoked unconditionally for NFSv2 and NFSv3 mounts
so the case-sensitivity capabilities are established even when
the user pins server->namelen with the namlen= mount option. That
option is orthogonal to case handling, and skipping PATHCONF
because namelen was already known would leave the caps unset.
The two capability bits carry opposite polarity
because their POSIX defaults differ. Most servers are
case-sensitive and case-preserving, matching "neither
xflag set." NFS_CAP_CASE_INSENSITIVE is set only when the
server affirms case insensitivity, so "server said no" and
"server did not answer" both collapse to the case-sensitive
default. NFS_CAP_CASE_NONPRESERVING follows the same pattern in
the opposite direction: set only when the server affirms that it
does not preserve case, so that silence or a missing attribute
lands on the case-preserving default. The NFSv4 probe checks
res.attr_bitmask[0] to distinguish "server said false" from "server
omitted the attribute" before setting the bit.
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfs/client.c | 22 +++++++++++++++++-----
fs/nfs/inode.c | 23 +++++++++++++++++++++++
fs/nfs/internal.h | 3 +++
fs/nfs/nfs3proc.c | 2 ++
fs/nfs/nfs3xdr.c | 7 +++++--
fs/nfs/nfs4proc.c | 7 +++++--
fs/nfs/proc.c | 3 +++
fs/nfs/symlink.c | 3 +++
include/linux/nfs_fs_sb.h | 2 +-
include/linux/nfs_xdr.h | 2 ++
10 files changed, 64 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index be02bb227741..2f4d41ecfa71 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -933,15 +933,27 @@ static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, str
nfs_server_set_fsinfo(server, &fsinfo);
- /* Get some general file system info */
- if (server->namelen == 0) {
- struct nfs_pathconf pathinfo;
+ {
+ struct nfs_pathconf pathinfo = { };
pathinfo.fattr = fattr;
nfs_fattr_init(fattr);
- if (clp->rpc_ops->pathconf(server, mntfh, &pathinfo) >= 0)
- server->namelen = pathinfo.max_namelen;
+ if (clp->rpc_ops->pathconf(server, mntfh, &pathinfo) >= 0) {
+ if (server->namelen == 0)
+ server->namelen = pathinfo.max_namelen;
+ /*
+ * NFSv4 PATHCONF does not carry the case-sensitivity
+ * fields; those caps are set from FATTR4_CASE_*
+ * attributes during the set_capabilities probe.
+ */
+ if (clp->rpc_ops->version < 4) {
+ if (pathinfo.case_insensitive)
+ server->caps |= NFS_CAP_CASE_INSENSITIVE;
+ if (!pathinfo.case_preserving)
+ server->caps |= NFS_CAP_CASE_NONPRESERVING;
+ }
+ }
}
if (clp->rpc_ops->discover_trunking != NULL &&
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 98a8f0de1199..bce2466552c4 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -41,6 +41,7 @@
#include <linux/freezer.h>
#include <linux/uaccess.h>
#include <linux/iversion.h>
+#include <linux/fileattr.h>
#include "nfs4_fs.h"
#include "callback.h"
@@ -1101,6 +1102,28 @@ int nfs_getattr(struct mnt_idmap *idmap, const struct path *path,
}
EXPORT_SYMBOL_GPL(nfs_getattr);
+int nfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ struct inode *inode = d_inode(dentry);
+
+ /*
+ * Case handling is a property of the exported filesystem on the
+ * NFS server, reported to the client at mount via PATHCONF
+ * (NFSv3) or FATTR4_CASE_INSENSITIVE / FATTR4_CASE_PRESERVING
+ * (NFSv4). Unlike filesystems that always preserve case, an NFS
+ * mount may front a backend that does not, so both flags can
+ * appear.
+ */
+ if (nfs_server_capable(inode, NFS_CAP_CASE_INSENSITIVE)) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
+ if (nfs_server_capable(inode, NFS_CAP_CASE_NONPRESERVING))
+ fa->fsx_xflags |= FS_XFLAG_CASENONPRESERVING;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(nfs_fileattr_get);
+
static void nfs_init_lock_context(struct nfs_lock_context *l_ctx)
{
refcount_set(&l_ctx->count, 1);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index fc5456377160..309d3f679bb3 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -449,6 +449,9 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
+struct file_kattr;
+int nfs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
+
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
/* localio.c */
struct nfs_local_dio {
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 95d7cd564b74..b80d0c5efc27 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -1053,6 +1053,7 @@ static const struct inode_operations nfs3_dir_inode_operations = {
.permission = nfs_permission,
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
#ifdef CONFIG_NFS_V3_ACL
.listxattr = nfs3_listxattr,
.get_inode_acl = nfs3_get_acl,
@@ -1064,6 +1065,7 @@ static const struct inode_operations nfs3_file_inode_operations = {
.permission = nfs_permission,
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
#ifdef CONFIG_NFS_V3_ACL
.listxattr = nfs3_listxattr,
.get_inode_acl = nfs3_get_acl,
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index e17d72908412..e745e78faab0 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2276,8 +2276,11 @@ static int decode_pathconf3resok(struct xdr_stream *xdr,
if (unlikely(!p))
return -EIO;
result->max_link = be32_to_cpup(p++);
- result->max_namelen = be32_to_cpup(p);
- /* ignore remaining fields */
+ result->max_namelen = be32_to_cpup(p++);
+ p++; /* ignore no_trunc */
+ p++; /* ignore chown_restricted */
+ result->case_insensitive = be32_to_cpup(p++) != 0;
+ result->case_preserving = be32_to_cpup(p) != 0;
return 0;
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index d839a97df822..034e3e87e863 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3944,8 +3944,9 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
server->caps |= NFS_CAP_SYMLINKS;
if (res.case_insensitive)
server->caps |= NFS_CAP_CASE_INSENSITIVE;
- if (res.case_preserving)
- server->caps |= NFS_CAP_CASE_PRESERVING;
+ if ((res.attr_bitmask[0] & FATTR4_WORD0_CASE_PRESERVING) &&
+ !res.case_preserving)
+ server->caps |= NFS_CAP_CASE_NONPRESERVING;
#ifdef CONFIG_NFS_V4_SECURITY_LABEL
if (res.attr_bitmask[2] & FATTR4_WORD2_SECURITY_LABEL)
server->caps |= NFS_CAP_SECURITY_LABEL;
@@ -10598,6 +10599,7 @@ static const struct inode_operations nfs4_dir_inode_operations = {
.getattr = nfs_getattr,
.setattr = nfs_setattr,
.listxattr = nfs4_listxattr,
+ .fileattr_get = nfs_fileattr_get,
};
static const struct inode_operations nfs4_file_inode_operations = {
@@ -10605,6 +10607,7 @@ static const struct inode_operations nfs4_file_inode_operations = {
.getattr = nfs_getattr,
.setattr = nfs_setattr,
.listxattr = nfs4_listxattr,
+ .fileattr_get = nfs_fileattr_get,
};
static struct nfs_server *nfs4_clone_server(struct nfs_server *source,
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 70795684b8e8..03c2c1f31be9 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -598,6 +598,7 @@ nfs_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
{
info->max_link = 0;
info->max_namelen = NFS2_MAXNAMLEN;
+ info->case_preserving = true;
return 0;
}
@@ -718,12 +719,14 @@ static const struct inode_operations nfs_dir_inode_operations = {
.permission = nfs_permission,
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
};
static const struct inode_operations nfs_file_inode_operations = {
.permission = nfs_permission,
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
};
const struct nfs_rpc_ops nfs_v2_clientops = {
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 58146e935402..74a072896f8d 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -22,6 +22,8 @@
#include <linux/mm.h>
#include <linux/string.h>
+#include "internal.h"
+
/* Symlink caching in the page cache is even more simplistic
* and straight-forward than readdir caching.
*/
@@ -74,4 +76,5 @@ const struct inode_operations nfs_symlink_inode_operations = {
.get_link = nfs_get_link,
.getattr = nfs_getattr,
.setattr = nfs_setattr,
+ .fileattr_get = nfs_fileattr_get,
};
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 4daee27fa5eb..34d294774f8c 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -306,7 +306,7 @@ struct nfs_server {
#define NFS_CAP_ATOMIC_OPEN (1U << 4)
#define NFS_CAP_LGOPEN (1U << 5)
#define NFS_CAP_CASE_INSENSITIVE (1U << 6)
-#define NFS_CAP_CASE_PRESERVING (1U << 7)
+#define NFS_CAP_CASE_NONPRESERVING (1U << 7)
#define NFS_CAP_REBOOT_LAYOUTRETURN (1U << 8)
#define NFS_CAP_OFFLOAD_STATUS (1U << 9)
#define NFS_CAP_ZERO_RANGE (1U << 10)
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index ff1f12aa73d2..7c2057e40f99 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -182,6 +182,8 @@ struct nfs_pathconf {
struct nfs_fattr *fattr; /* Post-op attributes */
__u32 max_link; /* max # of hard links */
__u32 max_namelen; /* max name length */
+ bool case_insensitive;
+ bool case_preserving;
};
struct nfs4_change_info {
--
2.53.0
^ permalink raw reply related
* [PATCH v11 11/15] vboxsf: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Upper layers such as NFSD need a way to query whether a
filesystem handles filenames in a case-sensitive manner. Report
VirtualBox shared folder case handling behavior via the
FS_XFLAG_CASEFOLD flag.
The case sensitivity property is queried from the VirtualBox host
service at mount time and cached in struct vboxsf_sbi. The host
determines case sensitivity based on the underlying host filesystem
(for example, Windows NTFS is case-insensitive while Linux ext4 is
case-sensitive).
VirtualBox shared folders always preserve filename case exactly
as provided by the guest. The host interface does not expose a
separate case-preserving property; leaving
FS_XFLAG_CASENONPRESERVING unset reports the POSIX-default
case-preserving behavior, which matches vboxsf semantics.
The callback is registered in all three inode_operations
structures (directory, file, and symlink) to ensure consistent
reporting across all inode types.
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/vboxsf/dir.c | 1 +
fs/vboxsf/file.c | 6 ++++--
fs/vboxsf/super.c | 7 +++++++
fs/vboxsf/utils.c | 30 ++++++++++++++++++++++++++++++
fs/vboxsf/vfsmod.h | 6 ++++++
5 files changed, 48 insertions(+), 2 deletions(-)
diff --git a/fs/vboxsf/dir.c b/fs/vboxsf/dir.c
index 42bedc4ec7af..c5bd3271aa96 100644
--- a/fs/vboxsf/dir.c
+++ b/fs/vboxsf/dir.c
@@ -477,4 +477,5 @@ const struct inode_operations vboxsf_dir_iops = {
.symlink = vboxsf_dir_symlink,
.getattr = vboxsf_getattr,
.setattr = vboxsf_setattr,
+ .fileattr_get = vboxsf_fileattr_get,
};
diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
index 7a7a3fbb2651..943953867e18 100644
--- a/fs/vboxsf/file.c
+++ b/fs/vboxsf/file.c
@@ -222,7 +222,8 @@ const struct file_operations vboxsf_reg_fops = {
const struct inode_operations vboxsf_reg_iops = {
.getattr = vboxsf_getattr,
- .setattr = vboxsf_setattr
+ .setattr = vboxsf_setattr,
+ .fileattr_get = vboxsf_fileattr_get,
};
static int vboxsf_read_folio(struct file *file, struct folio *folio)
@@ -389,5 +390,6 @@ static const char *vboxsf_get_link(struct dentry *dentry, struct inode *inode,
}
const struct inode_operations vboxsf_lnk_iops = {
- .get_link = vboxsf_get_link
+ .get_link = vboxsf_get_link,
+ .fileattr_get = vboxsf_fileattr_get,
};
diff --git a/fs/vboxsf/super.c b/fs/vboxsf/super.c
index a618cb093e00..a61fbab51d37 100644
--- a/fs/vboxsf/super.c
+++ b/fs/vboxsf/super.c
@@ -185,6 +185,13 @@ static int vboxsf_fill_super(struct super_block *sb, struct fs_context *fc)
if (err)
goto fail_unmap;
+ /*
+ * A failed query leaves sbi->case_insensitive false, so the
+ * mount defaults to reporting case-sensitive behavior. Do not
+ * fail the mount over an advisory attribute.
+ */
+ vboxsf_query_case_sensitive(sbi);
+
sb->s_magic = VBOXSF_SUPER_MAGIC;
sb->s_blocksize = 1024;
sb->s_maxbytes = MAX_LFS_FILESIZE;
diff --git a/fs/vboxsf/utils.c b/fs/vboxsf/utils.c
index 440e8c50629d..298bfc93255c 100644
--- a/fs/vboxsf/utils.c
+++ b/fs/vboxsf/utils.c
@@ -11,6 +11,7 @@
#include <linux/sizes.h>
#include <linux/pagemap.h>
#include <linux/vfs.h>
+#include <linux/fileattr.h>
#include "vfsmod.h"
struct inode *vboxsf_new_inode(struct super_block *sb)
@@ -567,3 +568,32 @@ int vboxsf_dir_read_all(struct vboxsf_sbi *sbi, struct vboxsf_dir_info *sf_d,
return err;
}
+
+int vboxsf_query_case_sensitive(struct vboxsf_sbi *sbi)
+{
+ struct shfl_volinfo volinfo = {};
+ u32 buf_len;
+ int err;
+
+ buf_len = sizeof(volinfo);
+ err = vboxsf_fsinfo(sbi->root, 0, SHFL_INFO_GET | SHFL_INFO_VOLUME,
+ &buf_len, &volinfo);
+ if (err)
+ return err;
+ if (buf_len < sizeof(volinfo))
+ return 0;
+
+ sbi->case_insensitive = !volinfo.properties.case_sensitive;
+ return 0;
+}
+
+int vboxsf_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ struct vboxsf_sbi *sbi = VBOXSF_SBI(dentry->d_sb);
+
+ if (sbi->case_insensitive) {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
+ return 0;
+}
diff --git a/fs/vboxsf/vfsmod.h b/fs/vboxsf/vfsmod.h
index 05973eb89d52..b61afd0ce842 100644
--- a/fs/vboxsf/vfsmod.h
+++ b/fs/vboxsf/vfsmod.h
@@ -47,6 +47,7 @@ struct vboxsf_sbi {
u32 next_generation;
u32 root;
int bdi_id;
+ bool case_insensitive;
};
/* per-inode information */
@@ -111,6 +112,11 @@ void vboxsf_dir_info_free(struct vboxsf_dir_info *p);
int vboxsf_dir_read_all(struct vboxsf_sbi *sbi, struct vboxsf_dir_info *sf_d,
u64 handle);
+int vboxsf_query_case_sensitive(struct vboxsf_sbi *sbi);
+
+struct file_kattr;
+int vboxsf_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
+
/* from vboxsf_wrappers.c */
int vboxsf_connect(void);
void vboxsf_disconnect(void);
--
2.53.0
^ permalink raw reply related
* [PATCH v11 12/15] isofs: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Upper layers such as NFSD need a way to query whether a
filesystem handles filenames in a case-sensitive manner so
they can provide correct semantics to remote clients. Without
this information, NFS exports of ISO 9660 filesystems cannot
advertise their filename case behavior.
Implement isofs_fileattr_get() to report ISO 9660 case handling
behavior via the FS_XFLAG_CASEFOLD flag. The 'check=r' (relaxed)
mount option enables case-insensitive lookups, and this setting
determines the value reported. By default, Joliet extensions
operate in relaxed mode while plain ISO 9660 uses strict
(case-sensitive) mode. All ISO 9660 variants are case-preserving,
meaning filenames are stored exactly as they appear on the disc.
Case handling is a superblock-wide property, so the callback
must report the same value for every inode type. Regular files
previously had no inode_operations; introduce
isofs_file_inode_operations to carry the callback. Symlinks
previously shared page_symlink_inode_operations; introduce
isofs_symlink_inode_operations, which wires page_get_link
alongside the callback, so that fileattr queries on a symlink
reach the isofs implementation instead of returning
-ENOIOCTLCMD. The flag is set in both fa->fsx_xflags and
fa->flags so FS_IOC_FSGETXATTR and FS_IOC_GETFLAGS agree.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/isofs/dir.c | 24 ++++++++++++++++++++++++
fs/isofs/inode.c | 3 ++-
fs/isofs/isofs.h | 5 +++++
3 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/fs/isofs/dir.c b/fs/isofs/dir.c
index 2fd9948d606e..1db6b0db3808 100644
--- a/fs/isofs/dir.c
+++ b/fs/isofs/dir.c
@@ -14,6 +14,7 @@
#include <linux/gfp.h>
#include <linux/filelock.h>
#include "isofs.h"
+#include <linux/fileattr.h>
int isofs_name_translate(struct iso_directory_record *de, char *new, struct inode *inode)
{
@@ -267,6 +268,17 @@ static int isofs_readdir(struct file *file, struct dir_context *ctx)
return result;
}
+int isofs_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
+{
+ struct isofs_sb_info *sbi = ISOFS_SB(dentry->d_sb);
+
+ if (sbi->s_check == 'r') {
+ fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
+ fa->flags |= FS_CASEFOLD_FL;
+ }
+ return 0;
+}
+
const struct file_operations isofs_dir_operations =
{
.llseek = generic_file_llseek,
@@ -281,6 +293,18 @@ const struct file_operations isofs_dir_operations =
const struct inode_operations isofs_dir_inode_operations =
{
.lookup = isofs_lookup,
+ .fileattr_get = isofs_fileattr_get,
+};
+
+const struct inode_operations isofs_file_inode_operations =
+{
+ .fileattr_get = isofs_fileattr_get,
+};
+
+const struct inode_operations isofs_symlink_inode_operations =
+{
+ .get_link = page_get_link,
+ .fileattr_get = isofs_fileattr_get,
};
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index efee53717f1c..68c286b7cc35 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -1427,6 +1427,7 @@ static int isofs_read_inode(struct inode *inode, int relocated)
/* Install the inode operations vector */
if (S_ISREG(inode->i_mode)) {
+ inode->i_op = &isofs_file_inode_operations;
inode->i_fop = &generic_ro_fops;
switch (ei->i_file_format) {
#ifdef CONFIG_ZISOFS
@@ -1442,7 +1443,7 @@ static int isofs_read_inode(struct inode *inode, int relocated)
inode->i_op = &isofs_dir_inode_operations;
inode->i_fop = &isofs_dir_operations;
} else if (S_ISLNK(inode->i_mode)) {
- inode->i_op = &page_symlink_inode_operations;
+ inode->i_op = &isofs_symlink_inode_operations;
inode_nohighmem(inode);
inode->i_data.a_ops = &isofs_symlink_aops;
} else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
diff --git a/fs/isofs/isofs.h b/fs/isofs/isofs.h
index 506555837533..a3cda3430020 100644
--- a/fs/isofs/isofs.h
+++ b/fs/isofs/isofs.h
@@ -197,7 +197,12 @@ isofs_normalize_block_and_offset(struct iso_directory_record* de,
}
}
+struct file_kattr;
+int isofs_fileattr_get(struct dentry *dentry, struct file_kattr *fa);
+
extern const struct inode_operations isofs_dir_inode_operations;
+extern const struct inode_operations isofs_file_inode_operations;
+extern const struct inode_operations isofs_symlink_inode_operations;
extern const struct file_operations isofs_dir_operations;
extern const struct address_space_operations isofs_symlink_aops;
extern const struct export_operations isofs_export_ops;
--
2.53.0
^ permalink raw reply related
* [PATCH v11 13/15] nfsd: Report export case-folding via NFSv3 PATHCONF
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
The hard-coded MSDOS_SUPER_MAGIC check in nfsd3_proc_pathconf()
only recognizes FAT filesystems as case-insensitive. Modern
filesystems like F2FS, exFAT, and CIFS support case-insensitive
directories, but NFSv3 clients cannot discover this capability.
Query the export's actual case behavior through ->fileattr_get
instead. This allows NFSv3 clients to correctly handle case
sensitivity for any filesystem that implements the fileattr
interface. Filesystems without ->fileattr_get continue to report
the default POSIX behavior (case-sensitive, case-preserving).
This change depends on commit ("fat: Implement fileattr_get for
case sensitivity"), which ensures FAT filesystems report their
case behavior correctly via the fileattr interface.
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs3proc.c | 18 ++++++++++--------
fs/nfsd/vfs.c | 43 +++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/vfs.h | 3 +++
3 files changed, 56 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 42adc5461db0..7b094c5908f1 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -717,17 +717,19 @@ nfsd3_proc_pathconf(struct svc_rqst *rqstp)
if (resp->status == nfs_ok) {
struct super_block *sb = argp->fh.fh_dentry->d_sb;
+ bool case_insensitive, case_preserving;
- /* Note that we don't care for remote fs's here */
- switch (sb->s_magic) {
- case EXT2_SUPER_MAGIC:
+ if (sb->s_magic == EXT2_SUPER_MAGIC) {
resp->p_link_max = EXT2_LINK_MAX;
resp->p_name_max = EXT2_NAME_LEN;
- break;
- case MSDOS_SUPER_MAGIC:
- resp->p_case_insensitive = 1;
- resp->p_case_preserving = 0;
- break;
+ }
+
+ resp->status = nfsd_get_case_info(argp->fh.fh_dentry,
+ &case_insensitive,
+ &case_preserving);
+ if (resp->status == nfs_ok) {
+ resp->p_case_insensitive = case_insensitive;
+ resp->p_case_preserving = case_preserving;
}
}
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index eafdf7b7890f..9214f1f1f83d 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -32,6 +32,7 @@
#include <linux/writeback.h>
#include <linux/security.h>
#include <linux/sunrpc/xdr.h>
+#include <linux/fileattr.h>
#include "xdr3.h"
@@ -2891,3 +2892,45 @@ nfsd_permission(struct svc_cred *cred, struct svc_export *exp,
return err? nfserrno(err) : 0;
}
+
+/**
+ * nfsd_get_case_info - get case sensitivity info for a dentry
+ * @dentry: dentry to query
+ * @case_insensitive: output, true if the filesystem is case-insensitive
+ * @case_preserving: output, true if the filesystem preserves case
+ *
+ * Filesystems without ->fileattr_get report POSIX defaults
+ * (case-sensitive, case-preserving). Outputs are unmodified on
+ * failure.
+ *
+ * Return: nfs_ok on success, or an nfserr on failure.
+ */
+__be32
+nfsd_get_case_info(struct dentry *dentry, bool *case_insensitive,
+ bool *case_preserving)
+{
+ struct file_kattr fa = {};
+ int err;
+
+ err = vfs_fileattr_get(dentry, &fa);
+ switch (err) {
+ case 0:
+ /* Success. */
+ break;
+ case -EINVAL:
+ case -ENOTTY:
+ case -ENOIOCTLCMD:
+ /* Query not supported: Report POSIX defaults. */
+ break;
+ default:
+ /*
+ * Query failed: Propagate that error since
+ * support for case-folding is unknown.
+ */
+ return nfserrno(err);
+ }
+
+ *case_insensitive = fa.fsx_xflags & FS_XFLAG_CASEFOLD;
+ *case_preserving = !(fa.fsx_xflags & FS_XFLAG_CASENONPRESERVING);
+ return nfs_ok;
+}
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 702a844f2106..abf33389ee81 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -156,6 +156,9 @@ __be32 nfsd_readdir(struct svc_rqst *, struct svc_fh *,
loff_t *, struct readdir_cd *, nfsd_filldir_t);
__be32 nfsd_statfs(struct svc_rqst *, struct svc_fh *,
struct kstatfs *, int access);
+__be32 nfsd_get_case_info(struct dentry *dentry,
+ bool *case_insensitive,
+ bool *case_preserving);
__be32 nfsd_permission(struct svc_cred *cred, struct svc_export *exp,
struct dentry *dentry, int acc);
--
2.53.0
^ permalink raw reply related
* [PATCH v11 14/15] nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
NFSD currently provides NFSv4 clients with hard-coded responses
indicating all exported filesystems are case-sensitive and
case-preserving. This is incorrect for case-insensitive filesystems
and ext4 directories with casefold enabled.
Query the underlying filesystem's actual case sensitivity via
nfsd_get_case_info() and return accurate values to clients. This
supports per-directory settings for filesystems that allow mixing
case-sensitive and case-insensitive directories within an export.
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4xdr.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 52 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 2a0946c630e1..68b23863dab1 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3158,6 +3158,8 @@ struct nfsd4_fattr_args {
u32 rdattr_err;
bool contextsupport;
bool ignore_crossmnt;
+ bool case_insensitive;
+ bool case_preserving;
};
typedef __be32(*nfsd4_enc_attr)(struct xdr_stream *xdr,
@@ -3356,6 +3358,33 @@ static __be32 nfsd4_encode_fattr4_acl(struct xdr_stream *xdr,
return nfs_ok;
}
+static __be32 nfsd4_encode_fattr4_case_insensitive(struct xdr_stream *xdr,
+ const struct nfsd4_fattr_args *args)
+{
+ return nfsd4_encode_bool(xdr, args->case_insensitive);
+}
+
+static __be32 nfsd4_encode_fattr4_case_preserving(struct xdr_stream *xdr,
+ const struct nfsd4_fattr_args *args)
+{
+ return nfsd4_encode_bool(xdr, args->case_preserving);
+}
+
+static __be32 nfsd4_encode_fattr4_homogeneous(struct xdr_stream *xdr,
+ const struct nfsd4_fattr_args *args)
+{
+ /*
+ * Filesystems with a Unicode encoding loaded (e.g. ext4, f2fs
+ * with the casefold feature) expose case folding as a
+ * per-directory attribute, so the per-file-system
+ * case_insensitive and case_preserving values can legitimately
+ * differ across objects that share the same fsid. Report
+ * FATTR4_HOMOGENEOUS = FALSE on such filesystems to keep that
+ * variation consistent with RFC 8881 Section 5.8.2.16.
+ */
+ return nfsd4_encode_bool(xdr, !sb_has_encoding(args->dentry->d_sb));
+}
+
static __be32 nfsd4_encode_fattr4_filehandle(struct xdr_stream *xdr,
const struct nfsd4_fattr_args *args)
{
@@ -3748,8 +3777,8 @@ static const nfsd4_enc_attr nfsd4_enc_fattr4_encode_ops[] = {
[FATTR4_ACLSUPPORT] = nfsd4_encode_fattr4_aclsupport,
[FATTR4_ARCHIVE] = nfsd4_encode_fattr4__noop,
[FATTR4_CANSETTIME] = nfsd4_encode_fattr4__true,
- [FATTR4_CASE_INSENSITIVE] = nfsd4_encode_fattr4__false,
- [FATTR4_CASE_PRESERVING] = nfsd4_encode_fattr4__true,
+ [FATTR4_CASE_INSENSITIVE] = nfsd4_encode_fattr4_case_insensitive,
+ [FATTR4_CASE_PRESERVING] = nfsd4_encode_fattr4_case_preserving,
[FATTR4_CHOWN_RESTRICTED] = nfsd4_encode_fattr4__true,
[FATTR4_FILEHANDLE] = nfsd4_encode_fattr4_filehandle,
[FATTR4_FILEID] = nfsd4_encode_fattr4_fileid,
@@ -3758,7 +3787,7 @@ static const nfsd4_enc_attr nfsd4_enc_fattr4_encode_ops[] = {
[FATTR4_FILES_TOTAL] = nfsd4_encode_fattr4_files_total,
[FATTR4_FS_LOCATIONS] = nfsd4_encode_fattr4_fs_locations,
[FATTR4_HIDDEN] = nfsd4_encode_fattr4__noop,
- [FATTR4_HOMOGENEOUS] = nfsd4_encode_fattr4__true,
+ [FATTR4_HOMOGENEOUS] = nfsd4_encode_fattr4_homogeneous,
[FATTR4_MAXFILESIZE] = nfsd4_encode_fattr4_maxfilesize,
[FATTR4_MAXLINK] = nfsd4_encode_fattr4_maxlink,
[FATTR4_MAXNAME] = nfsd4_encode_fattr4_maxname,
@@ -3968,6 +3997,26 @@ nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
args.fhp = tempfh;
} else
args.fhp = fhp;
+ if (attrmask[0] & (FATTR4_WORD0_CASE_INSENSITIVE |
+ FATTR4_WORD0_CASE_PRESERVING)) {
+ struct dentry *cd = dentry;
+
+ /*
+ * On casefold-capable file systems the flag lives
+ * on the directory, not on its entries. For a
+ * non-directory object, name-comparison semantics
+ * come from its parent. A directory (including the
+ * export root, whose parent is outside the export)
+ * is queried as-is so its own contents' lookup
+ * behaviour is reported.
+ */
+ if (!d_is_dir(dentry))
+ cd = dentry->d_parent;
+ status = nfsd_get_case_info(cd, &args.case_insensitive,
+ &args.case_preserving);
+ if (status != nfs_ok)
+ goto out;
+ }
if (attrmask[0] & FATTR4_WORD0_ACL) {
err = nfsd4_get_nfs4_acl(rqstp, dentry, &args.acl);
--
2.53.0
^ permalink raw reply related
* [PATCH v11 15/15] ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
From: Chuck Lever @ 2026-04-25 1:53 UTC (permalink / raw)
To: Al Viro, Christian Brauner, Jan Kara
Cc: linux-fsdevel, linux-ext4, linux-xfs, linux-cifs, linux-nfs,
linux-api, linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo,
yuezhang.mo, almaz.alexandrovich, slava, glaubitz, frank.li,
tytso, adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
FS_ATTRIBUTE_INFORMATION responses have always reported
FILE_CASE_SENSITIVE_SEARCH and FILE_CASE_PRESERVED_NAMES
unconditionally. Case-insensitive filesystems like exFAT, and
casefolded directories on ext4 or f2fs, have no way to signal
their actual semantics to SMB clients.
Now that filesystems expose case behavior through ->fileattr_get,
query it via vfs_fileattr_get() and translate the FS_XFLAG_CASEFOLD
and FS_XFLAG_CASENONPRESERVING flags into the corresponding SMB
attributes. Filesystems without ->fileattr_get continue reporting
default POSIX behavior (case-sensitive, case-preserving).
SMB's FS_ATTRIBUTE_INFORMATION reports per-share attributes from
the share root, not per-file. Shares mixing casefold and
non-casefold directories report the root directory's behavior.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/smb/server/smb2pdu.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c
index ee32e61b6d3c..face5390c614 100644
--- a/fs/smb/server/smb2pdu.c
+++ b/fs/smb/server/smb2pdu.c
@@ -14,6 +14,7 @@
#include <linux/falloc.h>
#include <linux/mount.h>
#include <linux/filelock.h>
+#include <linux/fileattr.h>
#include "glob.h"
#include "smbfsctl.h"
@@ -5541,16 +5542,33 @@ static int smb2_get_info_filesystem(struct ksmbd_work *work,
case FS_ATTRIBUTE_INFORMATION:
{
FILE_SYSTEM_ATTRIBUTE_INFO *info;
+ struct file_kattr fa = {};
size_t sz;
+ u32 attrs;
+ int err;
info = (FILE_SYSTEM_ATTRIBUTE_INFO *)rsp->Buffer;
- info->Attributes = cpu_to_le32(FILE_SUPPORTS_OBJECT_IDS |
- FILE_PERSISTENT_ACLS |
- FILE_UNICODE_ON_DISK |
- FILE_CASE_PRESERVED_NAMES |
- FILE_CASE_SENSITIVE_SEARCH |
- FILE_SUPPORTS_BLOCK_REFCOUNTING);
+ attrs = FILE_SUPPORTS_OBJECT_IDS |
+ FILE_PERSISTENT_ACLS |
+ FILE_UNICODE_ON_DISK |
+ FILE_SUPPORTS_BLOCK_REFCOUNTING;
+ err = vfs_fileattr_get(path.dentry, &fa);
+ /*
+ * -EINVAL: ntfs-3g and other FUSE filesystems that lack
+ * FS_IOC_FSGETXATTR support.
+ */
+ if (err && err != -ENOIOCTLCMD && err != -ENOTTY &&
+ err != -EINVAL) {
+ path_put(&path);
+ return err;
+ }
+ if (!(fa.fsx_xflags & FS_XFLAG_CASEFOLD))
+ attrs |= FILE_CASE_SENSITIVE_SEARCH;
+ if (!(fa.fsx_xflags & FS_XFLAG_CASENONPRESERVING))
+ attrs |= FILE_CASE_PRESERVED_NAMES;
+
+ info->Attributes = cpu_to_le32(attrs);
info->Attributes |= cpu_to_le32(server_conf.share_fake_fscaps);
if (test_share_config_flag(work->tcon->share_conf,
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v3 1/9] kernel/api: introduce kernel API specification framework
From: Nathan Chancellor @ 2026-04-27 3:37 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-api, linux-kernel, linux-doc, linux-fsdevel, linux-kbuild,
linux-kselftest, workflows, tools, x86, Thomas Gleixner,
Paul E . McKenney, Greg Kroah-Hartman, Jonathan Corbet,
Dmitry Vyukov, Randy Dunlap, Cyril Hrubis, Kees Cook, Jake Edge,
David Laight, Askar Safin, Gabriele Paoloni,
Mauro Carvalho Chehab, Christian Brauner, Alexander Viro,
Andrew Morton, Masahiro Yamada, Shuah Khan, Ingo Molnar,
Arnd Bergmann
In-Reply-To: <20260424165130.2306833-2-sashal@kernel.org>
On Fri, 24 Apr 2026 12:51:21 -0400, Sasha Levin <sashal@kernel.org> wrote:
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 6785982013dc..564315153643 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -59,6 +59,9 @@ obj-y += dma/
> obj-y += entry/
> obj-y += unwind/
> obj-$(CONFIG_MODULES) += module/
> +obj-$(CONFIG_KAPI_SPEC) += api/
> +# Ensure api/ is always cleaned even when CONFIG_KAPI_SPEC is not set
> +obj- += api/
If $(CONFIG_KAPI_SPEC) is not set, shouldn't
obj-$(CONFIG_KAPI_SPEC) += api/
evaluate to
obj- += api/
anyways? Why the duplication? This is the only place in the kernel where
this would be needed?
>
> diff --git a/kernel/api/.gitignore b/kernel/api/.gitignore
> new file mode 100644
> index 000000000000..ca2f632621cf
> --- /dev/null
> +++ b/kernel/api/.gitignore
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +/generated_api_specs.c
This appears unused?
>
> diff --git a/kernel/api/Kconfig b/kernel/api/Kconfig
> new file mode 100644
> index 000000000000..d1072728742a
> --- /dev/null
> +++ b/kernel/api/Kconfig
> @@ -0,0 +1,77 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Kernel API Specification Framework Configuration
> +#
> +
> +config KAPI_SPEC
> + bool "Kernel API Specification Framework"
> + default n
I think 'default n' is tautological since 'n' is the default for all
bool symbols. Consider dropping it on all symbols throughtout this file.
--
Nathan Chancellor <nathan@kernel.org>
^ permalink raw reply
* Re: [PATCH v11 12/15] isofs: Implement fileattr_get for case sensitivity
From: Jan Kara @ 2026-04-27 10:44 UTC (permalink / raw)
To: Chuck Lever
Cc: Al Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-ext4,
linux-xfs, linux-cifs, linux-nfs, linux-api, linux-f2fs-devel,
hirofumi, linkinjeon, sj1557.seo, yuezhang.mo,
almaz.alexandrovich, slava, glaubitz, frank.li, tytso,
adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <20260424-case-sensitivity-v11-12-de5619beddaf@oracle.com>
On Fri 24-04-26 21:53:14, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Upper layers such as NFSD need a way to query whether a
> filesystem handles filenames in a case-sensitive manner so
> they can provide correct semantics to remote clients. Without
> this information, NFS exports of ISO 9660 filesystems cannot
> advertise their filename case behavior.
>
> Implement isofs_fileattr_get() to report ISO 9660 case handling
> behavior via the FS_XFLAG_CASEFOLD flag. The 'check=r' (relaxed)
> mount option enables case-insensitive lookups, and this setting
> determines the value reported. By default, Joliet extensions
> operate in relaxed mode while plain ISO 9660 uses strict
> (case-sensitive) mode. All ISO 9660 variants are case-preserving,
> meaning filenames are stored exactly as they appear on the disc.
>
> Case handling is a superblock-wide property, so the callback
> must report the same value for every inode type. Regular files
> previously had no inode_operations; introduce
> isofs_file_inode_operations to carry the callback. Symlinks
> previously shared page_symlink_inode_operations; introduce
> isofs_symlink_inode_operations, which wires page_get_link
> alongside the callback, so that fileattr queries on a symlink
> reach the isofs implementation instead of returning
> -ENOIOCTLCMD. The flag is set in both fa->fsx_xflags and
> fa->flags so FS_IOC_FSGETXATTR and FS_IOC_GETFLAGS agree.
>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
...
> @@ -281,6 +293,18 @@ const struct file_operations isofs_dir_operations =
> const struct inode_operations isofs_dir_inode_operations =
> {
> .lookup = isofs_lookup,
> + .fileattr_get = isofs_fileattr_get,
> +};
> +
> +const struct inode_operations isofs_file_inode_operations =
> +{
> + .fileattr_get = isofs_fileattr_get,
> +};
> +
> +const struct inode_operations isofs_symlink_inode_operations =
> +{
> + .get_link = page_get_link,
> + .fileattr_get = isofs_fileattr_get,
> };
Hum, I thought casefolding is a directory attribute. At least I don't see
a big point in reporting it for regular files or symlinks (and then why not
report it for device nodes or named pipes?). So why did you decide for this
change?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply
* Re: [PATCH v11 00/15] Exposing case folding behavior
From: Jan Kara @ 2026-04-27 10:55 UTC (permalink / raw)
To: Chuck Lever
Cc: Al Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-ext4,
linux-xfs, linux-cifs, linux-nfs, linux-api, linux-f2fs-devel,
hirofumi, linkinjeon, sj1557.seo, yuezhang.mo,
almaz.alexandrovich, slava, glaubitz, frank.li, tytso,
adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Darrick J. Wong, Roland Mainz, Steve French
In-Reply-To: <20260424-case-sensitivity-v11-0-de5619beddaf@oracle.com>
On Fri 24-04-26 21:53:02, Chuck Lever wrote:
> Changes since v10:
> - cifs: Source case-handling flags from the server's cached
> FS_ATTRIBUTE_INFORMATION reply instead of the nocase mount
> option, with a nocase fallback when the reply is absent
> - Address findings from sashiko(gemini-3) and gpt-5.5:
> - nfs: Skip pathconf case bits on NFSv4 (set via FATTR4_CASE_*
> instead)
> - xfs: Hide FS_CASEFOLD_FL from the legacy flags view so
> chattr round-trips do not hit the setflags whitelist
> - ext4, f2fs: Drop redundant fileattr_get patches; the
> FS_CASEFOLD_FL translation in fileattr_fill_flags() already
> reports FS_XFLAG_CASEFOLD for casefolded directories
Err, how is this supposed to work? I wasn't able to find any code
transforming S_CASEFOLDED inode flag into FS_CASEFOLD_FL on fileattr_get
path. Sure, fileattr_fill_flags() takes care of setting FS_XFLAG_CASEFOLD
once FS_CASEFOLD_FL is set. What am I missing?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply
* Re: [PATCH v11 12/15] isofs: Implement fileattr_get for case sensitivity
From: Lionel Cons @ 2026-04-27 12:02 UTC (permalink / raw)
To: Jan Kara
Cc: Chuck Lever, Al Viro, Christian Brauner, linux-fsdevel,
linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
linux-f2fs-devel, hirofumi, linkinjeon, sj1557.seo, yuezhang.mo,
almaz.alexandrovich, slava, glaubitz, frank.li, tytso,
adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
Roland Mainz
In-Reply-To: <isfgwmd5hxjfn7dj7p54yzlhumx2hrkt3zw7fscs2ywm57g3hu@co27drpx24lq>
On Mon, 27 Apr 2026 at 12:47, Jan Kara <jack@suse.cz> wrote:
>
> On Fri 24-04-26 21:53:14, Chuck Lever wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> >
> > Upper layers such as NFSD need a way to query whether a
> > filesystem handles filenames in a case-sensitive manner so
> > they can provide correct semantics to remote clients. Without
> > this information, NFS exports of ISO 9660 filesystems cannot
> > advertise their filename case behavior.
> >
> > Implement isofs_fileattr_get() to report ISO 9660 case handling
> > behavior via the FS_XFLAG_CASEFOLD flag. The 'check=r' (relaxed)
> > mount option enables case-insensitive lookups, and this setting
> > determines the value reported. By default, Joliet extensions
> > operate in relaxed mode while plain ISO 9660 uses strict
> > (case-sensitive) mode. All ISO 9660 variants are case-preserving,
> > meaning filenames are stored exactly as they appear on the disc.
> >
> > Case handling is a superblock-wide property, so the callback
> > must report the same value for every inode type. Regular files
> > previously had no inode_operations; introduce
> > isofs_file_inode_operations to carry the callback. Symlinks
> > previously shared page_symlink_inode_operations; introduce
> > isofs_symlink_inode_operations, which wires page_get_link
> > alongside the callback, so that fileattr queries on a symlink
> > reach the isofs implementation instead of returning
> > -ENOIOCTLCMD. The flag is set in both fa->fsx_xflags and
> > fa->flags so FS_IOC_FSGETXATTR and FS_IOC_GETFLAGS agree.
> >
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org>
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>
> ...
>
> > @@ -281,6 +293,18 @@ const struct file_operations isofs_dir_operations =
> > const struct inode_operations isofs_dir_inode_operations =
> > {
> > .lookup = isofs_lookup,
> > + .fileattr_get = isofs_fileattr_get,
> > +};
> > +
> > +const struct inode_operations isofs_file_inode_operations =
> > +{
> > + .fileattr_get = isofs_fileattr_get,
> > +};
> > +
> > +const struct inode_operations isofs_symlink_inode_operations =
> > +{
> > + .get_link = page_get_link,
> > + .fileattr_get = isofs_fileattr_get,
> > };
>
> Hum, I thought casefolding is a directory attribute. At least I don't see
> a big point in reporting it for regular files or symlinks (and then why not
> report it for device nodes or named pipes?). So why did you decide for this
> change?
Where do you see this being a per-directory attribute in
https://web.archive.org/web/20170404043745/http://www.ymi.com/ymi/sites/default/files/pdf/Rockridge.pdf
Lionel
^ permalink raw reply
* Re: [PATCH v2 1/2] man/man3/errno.3: Document EFTYPE error code
From: Alejandro Colomar @ 2026-04-27 13:13 UTC (permalink / raw)
To: Florian Weimer
Cc: Dorjoy Chowdhury, linux-man, brauner, jlayton, libc-alpha,
linux-api
In-Reply-To: <lhu5x5c4rpl.fsf@oldenburg.str.redhat.com>
[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]
Hi Florian,
On 2026-04-27T12:34:30+0200, Florian Weimer wrote:
> * Alejandro Colomar:
>
> > [CC += libc-alpha]
> >
> > Hi Dorjoy,
> >
> > On 2026-04-26T17:14:25+0600, Dorjoy Chowdhury wrote:
> >> Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
> >
> > Thanks!
> >
> > Reviewed-by: Alejandro Colomar <alx@kernel.org>
> >
> > I will wait until glibc adds this error code to their <errno.h> before
> > applying the patch. This means either you should write and send a patch
> > to glibc (if so, please CC me), or you should ask them to add it
> > themselves (if you're not comfortable writing glibc code).
>
> I'm not sure where this is coming from.
Here's a link to the thread:
<https://lore.kernel.org/linux-man/20260426111707.36541-1-dorjoychy111@gmail.com/T/>
> POSIX says EFTYPE was rejected
> in favor of ENOTTY.
Could you please share a link to that?
Anyway, I guess ENOTTY would be inappropriate in this case. Although
maybe a better error code could be devised; I don't know. This is why
I wanted glibc involved in this discussion before this arrives to a
Linux release. Thanks for the quick feedback!
> Thanks,
> Florian
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v6 1/4] openat2: new OPENAT2_REGULAR flag support
From: Florian Weimer @ 2026-04-27 13:27 UTC (permalink / raw)
To: Dorjoy Chowdhury
Cc: linux-fsdevel, linux-kernel, linux-api, ceph-devel, gfs2,
linux-nfs, linux-cifs, v9fs, linux-kselftest, viro, brauner, jack,
jlayton, chuck.lever, alex.aring, arnd, adilger, mjguzik,
smfrench, richard.henderson, mattst88, linmag7, tsbogend,
James.Bottomley, deller, davem, andreas, idryomov, amarkuze,
slava, agruenba, trondmy, anna, sfrench, pc, ronniesahlberg,
sprasad, tom, bharathsm, shuah, miklos, hansg
In-Reply-To: <20260328172314.45807-2-dorjoychy111@gmail.com>
* Dorjoy Chowdhury:
> diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
> index 92e7ae493ee3..bd78e69e0a43 100644
> --- a/include/uapi/asm-generic/errno.h
> +++ b/include/uapi/asm-generic/errno.h
> @@ -122,4 +122,6 @@
>
> #define EHWPOISON 133 /* Memory page has hardware error */
>
> +#define EFTYPE 134 /* Wrong file type for the intended operation */
> +
> #endif
This is what POSIX says about EFTYPE, in the Rationale for System
Interfaces:
“
[EFTYPE]
This error code was proposed in earlier proposals as "Inappropriate
operation for file type", meaning that the operation requested is
not appropriate for the file specified in the function call. This
code was proposed, although the same idea was covered by [ENOTTY],
because the connotations of the name would be misleading. It was
pointed out that the fcntl() function uses the error code [EINVAL]
for this notion, and hence all instances of [EFTYPE] were changed to
this code.
”
So I'm not sure if reusing this name is a good idea.
Thanks,
Florian
^ permalink raw reply
* Re: [PATCH v2 1/2] man/man3/errno.3: Document EFTYPE error code
From: Florian Weimer @ 2026-04-27 13:29 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Dorjoy Chowdhury, linux-man, brauner, jlayton, libc-alpha,
linux-api
In-Reply-To: <ae9gDtEo6OxHTYBt@devuan>
* Alejandro Colomar:
> Hi Florian,
>
> On 2026-04-27T12:34:30+0200, Florian Weimer wrote:
>> * Alejandro Colomar:
>>
>> > [CC += libc-alpha]
>> >
>> > Hi Dorjoy,
>> >
>> > On 2026-04-26T17:14:25+0600, Dorjoy Chowdhury wrote:
>> >> Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
>> >
>> > Thanks!
>> >
>> > Reviewed-by: Alejandro Colomar <alx@kernel.org>
>> >
>> > I will wait until glibc adds this error code to their <errno.h> before
>> > applying the patch. This means either you should write and send a patch
>> > to glibc (if so, please CC me), or you should ask them to add it
>> > themselves (if you're not comfortable writing glibc code).
>>
>> I'm not sure where this is coming from.
>
> Here's a link to the thread:
> <https://lore.kernel.org/linux-man/20260426111707.36541-1-dorjoychy111@gmail.com/T/>
>
>> POSIX says EFTYPE was rejected
>> in favor of ENOTTY.
>
> Could you please share a link to that?
>
> Anyway, I guess ENOTTY would be inappropriate in this case. Although
> maybe a better error code could be devised; I don't know. This is why
> I wanted glibc involved in this discussion before this arrives to a
> Linux release. Thanks for the quick feedback!
It's in the Rationale for System Interfaces:
“
[EFTYPE]
This error code was proposed in earlier proposals as "Inappropriate
operation for file type", meaning that the operation requested is
not appropriate for the file specified in the function call. This
code was proposed, although the same idea was covered by [ENOTTY],
because the connotations of the name would be misleading. It was
pointed out that the fcntl() function uses the error code [EINVAL]
for this notion, and hence all instances of [EFTYPE] were changed to
this code.
”
I replied on linux-fsdevel, too.
(It would be nice to submit patches introducing new error codes to
linux-api with a subject mentioning the error code.)
Thanks,
Florian
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox