* Re: [PATCH 1/6] uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
From: Gao Xiang @ 2025-12-18 9:33 UTC (permalink / raw)
To: Darrick J. Wong, brauner
Cc: linux-api, linux-ext4, jack, linux-xfs, linux-fsdevel, gabriel,
hch, amir73il
In-Reply-To: <176602332146.686273.6355079912638580915.stgit@frogsfrogsfrogs>
On 2025/12/18 10:02, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Stop definining these privately and instead move them to the uapi
> errno.h so that they become canonical instead of copy pasta.
>
> Cc: linux-api@vger.kernel.org
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
(...I really think it could be done earlier)
Thanks,
Gao Xiang
^ permalink raw reply
* [PATCH v23 8/8] selftests/clone3: Test shadow stack support
From: Mark Brown @ 2025-12-18 8:10 UTC (permalink / raw)
To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Christian Brauner, Shuah Khan
Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
linux-kselftest, linux-api, Mark Brown, Kees Cook, Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>
Add basic test coverage for specifying the shadow stack for a newly
created thread via clone3(), including coverage of the newly extended
argument structure. We check that a user specified shadow stack can be
provided, and that invalid combinations of parameters are rejected.
In order to facilitate testing on systems without userspace shadow stack
support we manually enable shadow stacks on startup, this is architecture
specific due to the use of an arch_prctl() on x86. Due to interactions with
potential userspace locking of features we actually detect support for
shadow stacks on the running system by attempting to allocate a shadow
stack page during initialisation using map_shadow_stack(), warning if this
succeeds when the enable failed.
In order to allow testing of user configured shadow stacks on
architectures with that feature we need to ensure that we do not return
from the function where the clone3() syscall is called in the child
process, doing so would trigger a shadow stack underflow. To do this we
use inline assembly rather than the standard syscall wrapper to call
clone3(). In order to avoid surprises we also use a syscall rather than
the libc exit() function., this should be overly cautious.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
tools/testing/selftests/clone3/clone3.c | 143 +++++++++++++++++++++-
tools/testing/selftests/clone3/clone3_selftests.h | 63 ++++++++++
2 files changed, 205 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 932cc64e9ae4..829f0d1268c8 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -3,6 +3,7 @@
/* Based on Christian Brauner's clone3() example */
#define _GNU_SOURCE
+#include <asm/mman.h>
#include <errno.h>
#include <inttypes.h>
#include <linux/types.h>
@@ -11,6 +12,7 @@
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
+#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/un.h>
@@ -19,8 +21,12 @@
#include <sched.h>
#include "kselftest.h"
+#include "ksft_shstk.h"
#include "clone3_selftests.h"
+static bool shadow_stack_supported;
+static size_t max_supported_args_size;
+
enum test_mode {
CLONE3_ARGS_NO_TEST,
CLONE3_ARGS_ALL_0,
@@ -28,6 +34,10 @@ enum test_mode {
CLONE3_ARGS_INVAL_EXIT_SIGNAL_NEG,
CLONE3_ARGS_INVAL_EXIT_SIGNAL_CSIG,
CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG,
+ CLONE3_ARGS_SHADOW_STACK,
+ CLONE3_ARGS_SHADOW_STACK_MISALIGNED,
+ CLONE3_ARGS_SHADOW_STACK_NO_TOKEN,
+ CLONE3_ARGS_SHADOW_STACK_NORMAL_MEMORY,
};
typedef bool (*filter_function)(void);
@@ -44,6 +54,44 @@ struct test {
filter_function filter;
};
+
+/*
+ * We check for shadow stack support by attempting to use
+ * map_shadow_stack() since features may have been locked by the
+ * dynamic linker resulting in spurious errors when we attempt to
+ * enable on startup. We warn if the enable failed.
+ */
+static void test_shadow_stack_supported(void)
+{
+ long ret;
+
+ ret = syscall(__NR_map_shadow_stack, 0, getpagesize(), 0);
+ if (ret == -1) {
+ ksft_print_msg("map_shadow_stack() not supported\n");
+ } else if ((void *)ret == MAP_FAILED) {
+ ksft_print_msg("Failed to map shadow stack\n");
+ } else {
+ ksft_print_msg("Shadow stack supportd\n");
+ shadow_stack_supported = true;
+
+ if (!shadow_stack_enabled)
+ ksft_print_msg("Mapped but did not enable shadow stack\n");
+ }
+}
+
+static void *get_shadow_stack_page(unsigned long flags)
+{
+ unsigned long long page;
+
+ page = syscall(__NR_map_shadow_stack, 0, getpagesize(), flags);
+ if ((void *)page == MAP_FAILED) {
+ ksft_print_msg("map_shadow_stack() failed: %d\n", errno);
+ return 0;
+ }
+
+ return (void *)page;
+}
+
static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
{
struct __clone_args args = {
@@ -57,6 +105,7 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
} args_ext;
pid_t pid = -1;
+ void *p;
int status;
memset(&args_ext, 0, sizeof(args_ext));
@@ -89,6 +138,26 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
case CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG:
args.exit_signal = 0x00000000000000f0ULL;
break;
+ case CLONE3_ARGS_SHADOW_STACK:
+ p = get_shadow_stack_page(SHADOW_STACK_SET_TOKEN);
+ p += getpagesize() - sizeof(void *);
+ args.shstk_token = (unsigned long long)p;
+ break;
+ case CLONE3_ARGS_SHADOW_STACK_MISALIGNED:
+ p = get_shadow_stack_page(SHADOW_STACK_SET_TOKEN);
+ p += getpagesize() - sizeof(void *) - 1;
+ args.shstk_token = (unsigned long long)p;
+ break;
+ case CLONE3_ARGS_SHADOW_STACK_NORMAL_MEMORY:
+ p = malloc(getpagesize());
+ p += getpagesize() - sizeof(void *);
+ args.shstk_token = (unsigned long long)p;
+ break;
+ case CLONE3_ARGS_SHADOW_STACK_NO_TOKEN:
+ p = get_shadow_stack_page(0);
+ p += getpagesize() - sizeof(void *);
+ args.shstk_token = (unsigned long long)p;
+ break;
}
memcpy(&args_ext.args, &args, sizeof(struct __clone_args));
@@ -102,7 +171,12 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
if (pid == 0) {
ksft_print_msg("I am the child, my PID is %d\n", getpid());
- _exit(EXIT_SUCCESS);
+ /*
+ * Use a raw syscall to ensure we don't get issues
+ * with manually specified shadow stack and exit handlers.
+ */
+ syscall(__NR_exit, EXIT_SUCCESS);
+ ksft_print_msg("CHILD FAILED TO EXIT PID is %d\n", getpid());
}
ksft_print_msg("I am the parent (%d). My child's pid is %d\n",
@@ -184,6 +258,26 @@ static bool no_timenamespace(void)
return true;
}
+static bool have_shadow_stack(void)
+{
+ if (shadow_stack_supported) {
+ ksft_print_msg("Shadow stack supported\n");
+ return true;
+ }
+
+ return false;
+}
+
+static bool no_shadow_stack(void)
+{
+ if (!shadow_stack_supported) {
+ ksft_print_msg("Shadow stack not supported\n");
+ return true;
+ }
+
+ return false;
+}
+
static size_t page_size_plus_8(void)
{
return getpagesize() + 8;
@@ -327,6 +421,50 @@ static const struct test tests[] = {
.expected = -EINVAL,
.test_mode = CLONE3_ARGS_NO_TEST,
},
+ {
+ .name = "Shadow stack on system with shadow stack",
+ .size = 0,
+ .expected = 0,
+ .e2big_valid = true,
+ .test_mode = CLONE3_ARGS_SHADOW_STACK,
+ .filter = no_shadow_stack,
+ },
+ {
+ .name = "Shadow stack with misaligned address",
+ .flags = CLONE_VM,
+ .size = 0,
+ .expected = -EINVAL,
+ .e2big_valid = true,
+ .test_mode = CLONE3_ARGS_SHADOW_STACK_MISALIGNED,
+ .filter = no_shadow_stack,
+ },
+ {
+ .name = "Shadow stack with normal memory",
+ .flags = CLONE_VM,
+ .size = 0,
+ .expected = -EFAULT,
+ .e2big_valid = true,
+ .test_mode = CLONE3_ARGS_SHADOW_STACK_NORMAL_MEMORY,
+ .filter = no_shadow_stack,
+ },
+ {
+ .name = "Shadow stack with no token",
+ .flags = CLONE_VM,
+ .size = 0,
+ .expected = -EINVAL,
+ .e2big_valid = true,
+ .test_mode = CLONE3_ARGS_SHADOW_STACK_NO_TOKEN,
+ .filter = no_shadow_stack,
+ },
+ {
+ .name = "Shadow stack on system without shadow stack",
+ .flags = CLONE_VM,
+ .size = 0,
+ .expected = -EFAULT,
+ .e2big_valid = true,
+ .test_mode = CLONE3_ARGS_SHADOW_STACK_NORMAL_MEMORY,
+ .filter = have_shadow_stack,
+ },
};
int main(int argc, char *argv[])
@@ -334,9 +472,12 @@ int main(int argc, char *argv[])
size_t size;
int i;
+ enable_shadow_stack();
+
ksft_print_header();
ksft_set_plan(ARRAY_SIZE(tests));
test_clone3_supported();
+ test_shadow_stack_supported();
for (i = 0; i < ARRAY_SIZE(tests); i++)
test_clone3(&tests[i]);
diff --git a/tools/testing/selftests/clone3/clone3_selftests.h b/tools/testing/selftests/clone3/clone3_selftests.h
index a7ab2f1cccda..97d98d07fb78 100644
--- a/tools/testing/selftests/clone3/clone3_selftests.h
+++ b/tools/testing/selftests/clone3/clone3_selftests.h
@@ -31,12 +31,75 @@ struct __clone_args {
__aligned_u64 set_tid;
__aligned_u64 set_tid_size;
__aligned_u64 cgroup;
+#ifndef CLONE_ARGS_SIZE_VER2
+#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
+#endif
+ __aligned_u64 shstk_token;
+#ifndef CLONE_ARGS_SIZE_VER3
+#define CLONE_ARGS_SIZE_VER3 96 /* sizeof fourth published struct */
+#endif
};
+/*
+ * For architectures with shadow stack support we need to be
+ * absolutely sure that the clone3() syscall will be inline and not a
+ * function call so we open code.
+ */
+#ifdef __x86_64__
+static __always_inline pid_t sys_clone3(struct __clone_args *args, size_t size)
+{
+ register long _num __asm__ ("rax") = __NR_clone3;
+ register long _args __asm__ ("rdi") = (long)(args);
+ register long _size __asm__ ("rsi") = (long)(size);
+ long ret;
+
+ __asm__ volatile (
+ "syscall\n"
+ : "=a"(ret)
+ : "r"(_args), "r"(_size),
+ "0"(_num)
+ : "rcx", "r11", "memory", "cc"
+ );
+
+ if (ret < 0) {
+ errno = -ret;
+ return -1;
+ }
+
+ return ret;
+}
+#elif defined(__aarch64__)
+static __always_inline pid_t sys_clone3(struct __clone_args *args, size_t size)
+{
+ register long _num __asm__ ("x8") = __NR_clone3;
+ register long _args __asm__ ("x0") = (long)(args);
+ register long _size __asm__ ("x1") = (long)(size);
+ register long arg2 __asm__ ("x2") = 0;
+ register long arg3 __asm__ ("x3") = 0;
+ register long arg4 __asm__ ("x4") = 0;
+
+ __asm__ volatile (
+ "svc #0\n"
+ : "=r"(_args)
+ : "r"(_args), "r"(_size),
+ "r"(_num), "r"(arg2),
+ "r"(arg3), "r"(arg4)
+ : "memory", "cc"
+ );
+
+ if ((int)_args < 0) {
+ errno = -((int)_args);
+ return -1;
+ }
+
+ return _args;
+}
+#else
static pid_t sys_clone3(struct __clone_args *args, size_t size)
{
return syscall(__NR_clone3, args, size);
}
+#endif
static inline void test_clone3_supported(void)
{
--
2.47.3
^ permalink raw reply related
* [PATCH v23 7/8] selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
From: Mark Brown @ 2025-12-18 8:10 UTC (permalink / raw)
To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Christian Brauner, Shuah Khan
Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>
The clone_args structure is extensible, with the syscall passing in the
length of the structure. Inside the kernel we use copy_struct_from_user()
to read the struct but this has the unfortunate side effect of silently
accepting some overrun in the structure size providing the extra data is
all zeros. This means that we can't discover the clone3() features that
the running kernel supports by simply probing with various struct sizes.
We need to check this for the benefit of test systems which run newer
kselftests on old kernels.
Add a flag which can be set on a test to indicate that clone3() may return
-E2BIG due to the use of newer struct versions. Currently no tests need
this but it will become an issue for testing clone3() support for shadow
stacks, the support for shadow stacks is already present on x86.
Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Kees Cook <kees@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
tools/testing/selftests/clone3/clone3.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 8c852d022c55..932cc64e9ae4 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -39,6 +39,7 @@ struct test {
size_t size;
size_function size_function;
int expected;
+ bool e2big_valid;
enum test_mode test_mode;
filter_function filter;
};
@@ -146,6 +147,11 @@ static void test_clone3(const struct test *test)
ksft_print_msg("[%d] clone3() with flags says: %d expected %d\n",
getpid(), ret, test->expected);
if (ret != test->expected) {
+ if (test->e2big_valid && ret == -E2BIG) {
+ ksft_print_msg("Test reported -E2BIG\n");
+ ksft_test_result_skip("%s\n", test->name);
+ return;
+ }
ksft_print_msg(
"[%d] Result (%d) is different than expected (%d)\n",
getpid(), ret, test->expected);
--
2.47.3
^ permalink raw reply related
* [PATCH v23 6/8] selftests/clone3: Factor more of main loop into test_clone3()
From: Mark Brown @ 2025-12-18 8:10 UTC (permalink / raw)
To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Christian Brauner, Shuah Khan
Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>
In order to make it easier to add more configuration for the tests and
more support for runtime detection of when tests can be run pass the
structure describing the tests into test_clone3() rather than picking
the arguments out of it and have that function do all the per-test work.
No functional change.
Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Kees Cook <kees@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
tools/testing/selftests/clone3/clone3.c | 77 ++++++++++++++++-----------------
1 file changed, 37 insertions(+), 40 deletions(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 289e0c7c1f09..8c852d022c55 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -30,6 +30,19 @@ enum test_mode {
CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG,
};
+typedef bool (*filter_function)(void);
+typedef size_t (*size_function)(void);
+
+struct test {
+ const char *name;
+ uint64_t flags;
+ size_t size;
+ size_function size_function;
+ int expected;
+ enum test_mode test_mode;
+ filter_function filter;
+};
+
static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
{
struct __clone_args args = {
@@ -109,30 +122,40 @@ static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
return 0;
}
-static bool test_clone3(uint64_t flags, size_t size, int expected,
- enum test_mode test_mode)
+static void test_clone3(const struct test *test)
{
+ size_t size;
int ret;
+ if (test->filter && test->filter()) {
+ ksft_test_result_skip("%s\n", test->name);
+ return;
+ }
+
+ if (test->size_function)
+ size = test->size_function();
+ else
+ size = test->size;
+
+ ksft_print_msg("Running test '%s'\n", test->name);
+
ksft_print_msg(
"[%d] Trying clone3() with flags %#" PRIx64 " (size %zu)\n",
- getpid(), flags, size);
- ret = call_clone3(flags, size, test_mode);
+ getpid(), test->flags, size);
+ ret = call_clone3(test->flags, size, test->test_mode);
ksft_print_msg("[%d] clone3() with flags says: %d expected %d\n",
- getpid(), ret, expected);
- if (ret != expected) {
+ getpid(), ret, test->expected);
+ if (ret != test->expected) {
ksft_print_msg(
"[%d] Result (%d) is different than expected (%d)\n",
- getpid(), ret, expected);
- return false;
+ getpid(), ret, test->expected);
+ ksft_test_result_fail("%s\n", test->name);
+ return;
}
- return true;
+ ksft_test_result_pass("%s\n", test->name);
}
-typedef bool (*filter_function)(void);
-typedef size_t (*size_function)(void);
-
static bool not_root(void)
{
if (getuid() != 0) {
@@ -160,16 +183,6 @@ static size_t page_size_plus_8(void)
return getpagesize() + 8;
}
-struct test {
- const char *name;
- uint64_t flags;
- size_t size;
- size_function size_function;
- int expected;
- enum test_mode test_mode;
- filter_function filter;
-};
-
static const struct test tests[] = {
{
.name = "simple clone3()",
@@ -319,24 +332,8 @@ int main(int argc, char *argv[])
ksft_set_plan(ARRAY_SIZE(tests));
test_clone3_supported();
- for (i = 0; i < ARRAY_SIZE(tests); i++) {
- if (tests[i].filter && tests[i].filter()) {
- ksft_test_result_skip("%s\n", tests[i].name);
- continue;
- }
-
- if (tests[i].size_function)
- size = tests[i].size_function();
- else
- size = tests[i].size;
-
- ksft_print_msg("Running test '%s'\n", tests[i].name);
-
- ksft_test_result(test_clone3(tests[i].flags, size,
- tests[i].expected,
- tests[i].test_mode),
- "%s\n", tests[i].name);
- }
+ for (i = 0; i < ARRAY_SIZE(tests); i++)
+ test_clone3(&tests[i]);
ksft_finished();
}
--
2.47.3
^ permalink raw reply related
* [PATCH v23 5/8] selftests/clone3: Remove redundant flushes of output streams
From: Mark Brown @ 2025-12-18 8:10 UTC (permalink / raw)
To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Christian Brauner, Shuah Khan
Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>
Since there were widespread issues with output not being flushed the
kselftest framework was modified to explicitly set the output streams
unbuffered in commit 58e2847ad2e6 ("selftests: line buffer test
program's stdout") so there is no need to explicitly flush in the clone3
tests.
Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Kees Cook <kees@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
tools/testing/selftests/clone3/clone3_selftests.h | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/clone3/clone3_selftests.h b/tools/testing/selftests/clone3/clone3_selftests.h
index a0593e8950f0..a7ab2f1cccda 100644
--- a/tools/testing/selftests/clone3/clone3_selftests.h
+++ b/tools/testing/selftests/clone3/clone3_selftests.h
@@ -35,8 +35,6 @@ struct __clone_args {
static pid_t sys_clone3(struct __clone_args *args, size_t size)
{
- fflush(stdout);
- fflush(stderr);
return syscall(__NR_clone3, args, size);
}
--
2.47.3
^ permalink raw reply related
* [PATCH v23 4/8] fork: Add shadow stack support to clone3()
From: Mark Brown @ 2025-12-18 8:10 UTC (permalink / raw)
To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Christian Brauner, Shuah Khan
Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
linux-kselftest, linux-api, Mark Brown, Kees Cook
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>
Unlike with the normal stack there is no API for configuring the shadow
stack for a new thread, instead the kernel will dynamically allocate a
new shadow stack with the same size as the normal stack. This appears to
be due to the shadow stack series having been in development since
before the more extensible clone3() was added rather than anything more
deliberate.
Add a parameter to clone3() specifying a shadow stack pointer to use
for the new thread, this is inconsistent with the way we specify the
normal stack but during review concerns were expressed about having to
identify where the shadow stack pointer should be placed especially in
cases where the shadow stack has been previously active. If no shadow
stack is specified then the existing implicit allocation behaviour is
maintained.
If a shadow stack pointer is specified then it is required to have an
architecture defined token placed on the stack, this will be consumed by
the new task, the shadow stack is specified by pointing to this token. If
no valid token is present then this will be reported with -EINVAL. This
token prevents new threads being created pointing at the shadow stack of
an existing running thread. On architectures with support for userspace
pivoting of shadow stacks it is expected that the same format and placement
of tokens will be used, this is the case for arm64 and x86.
If the architecture does not support shadow stacks the shadow stack
pointer must be not be specified, architectures that do support the
feature are expected to enforce the same requirement on individual
systems that lack shadow stack support.
Update the existing arm64 and x86 implementations to pay attention to
the newly added arguments, in order to maintain compatibility we use the
existing behaviour if no shadow stack is specified. Since we are now
using more fields from the kernel_clone_args we pass that into the
shadow stack code rather than individual fields.
Portions of the x86 architecture code were written by Rick Edgecombe.
Acked-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
arch/arm64/mm/gcs.c | 47 +++++++++++++++++++-
arch/x86/include/asm/shstk.h | 11 +++--
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 53 ++++++++++++++++++++---
include/asm-generic/cacheflush.h | 11 +++++
include/linux/sched/task.h | 17 ++++++++
include/uapi/linux/sched.h | 9 ++--
kernel/fork.c | 93 ++++++++++++++++++++++++++++++++++------
8 files changed, 217 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 3abcbf9adb5c..fd1d5a6655de 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -43,8 +43,23 @@ int gcs_alloc_thread_stack(struct task_struct *tsk,
{
unsigned long addr, size;
- if (!system_supports_gcs())
+ if (!system_supports_gcs()) {
+ if (args->shstk_token)
+ return -EINVAL;
+
return 0;
+ }
+
+ /*
+ * If the user specified a GCS then use it, otherwise fall
+ * back to a default allocation strategy. Validation is done
+ * in arch_shstk_validate_clone().
+ */
+ if (args->shstk_token) {
+ tsk->thread.gcs_base = 0;
+ tsk->thread.gcs_size = 0;
+ return 0;
+ }
if (!task_gcs_el0_enabled(tsk))
return 0;
@@ -68,6 +83,36 @@ int gcs_alloc_thread_stack(struct task_struct *tsk,
return 0;
}
+static bool gcs_consume_token(struct vm_area_struct *vma, struct page *page,
+ unsigned long user_addr)
+{
+ u64 expected = GCS_CAP(user_addr);
+ u64 *token = page_address(page) + offset_in_page(user_addr);
+
+ if (!cmpxchg_to_user_page(vma, page, user_addr, token, expected, 0))
+ return false;
+ set_page_dirty_lock(page);
+
+ return true;
+}
+
+int arch_shstk_validate_clone(struct task_struct *tsk,
+ struct vm_area_struct *vma,
+ struct page *page,
+ struct kernel_clone_args *args)
+{
+ unsigned long gcspr_el0;
+ int ret = 0;
+
+ gcspr_el0 = args->shstk_token;
+ if (!gcs_consume_token(vma, page, gcspr_el0))
+ return -EINVAL;
+
+ tsk->thread.gcspr_el0 = gcspr_el0 + sizeof(u64);
+
+ return ret;
+}
+
SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
{
unsigned long alloc_size;
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index fc7dcec58fd4..42a5f03a51e2 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -6,6 +6,7 @@
#include <linux/types.h>
struct task_struct;
+struct kernel_clone_args;
struct ksignal;
#ifdef CONFIG_X86_USER_SHADOW_STACK
@@ -16,8 +17,8 @@ struct thread_shstk {
long shstk_prctl(struct task_struct *task, int option, unsigned long arg2);
void reset_thread_features(void);
-unsigned long shstk_alloc_thread_stack(struct task_struct *p, u64 clone_flags,
- unsigned long stack_size);
+unsigned long shstk_alloc_thread_stack(struct task_struct *p,
+ const struct kernel_clone_args *args);
void shstk_free(struct task_struct *p);
int setup_signal_shadow_stack(struct ksignal *ksig);
int restore_signal_shadow_stack(void);
@@ -30,8 +31,10 @@ static inline long shstk_prctl(struct task_struct *task, int option,
unsigned long arg2) { return -EINVAL; }
static inline void reset_thread_features(void) {}
static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p,
- u64 clone_flags,
- unsigned long stack_size) { return 0; }
+ const struct kernel_clone_args *args)
+{
+ return 0;
+}
static inline void shstk_free(struct task_struct *p) {}
static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
static inline int restore_signal_shadow_stack(void) { return 0; }
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4c718f8adc59..449da29a9f92 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -219,7 +219,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
* is disabled, new_ssp will remain 0, and fpu_clone() will know not to
* update it.
*/
- new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size);
+ new_ssp = shstk_alloc_thread_stack(p, args);
if (IS_ERR_VALUE(new_ssp))
return PTR_ERR((void *)new_ssp);
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 978232b6d48d..d66c866274d0 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -191,18 +191,61 @@ void reset_thread_features(void)
current->thread.features_locked = 0;
}
-unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, u64 clone_flags,
- unsigned long stack_size)
+int arch_shstk_validate_clone(struct task_struct *t,
+ struct vm_area_struct *vma,
+ struct page *page,
+ struct kernel_clone_args *args)
+{
+ void *maddr = page_address(page);
+ unsigned long token;
+ int offset;
+ u64 expected;
+
+ /*
+ * kernel_clone_args() verification assures token address is 8
+ * byte aligned.
+ */
+ token = args->shstk_token;
+ expected = (token + SS_FRAME_SIZE) | BIT(0);
+ offset = offset_in_page(token);
+
+ if (!cmpxchg_to_user_page(vma, page, token, (unsigned long *)(maddr + offset),
+ expected, 0))
+ return -EINVAL;
+ set_page_dirty_lock(page);
+
+ return 0;
+}
+
+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
+ const struct kernel_clone_args *args)
{
struct thread_shstk *shstk = &tsk->thread.shstk;
+ u64 clone_flags = args->flags;
unsigned long addr, size;
/*
* If shadow stack is not enabled on the new thread, skip any
- * switch to a new shadow stack.
+ * implicit switch to a new shadow stack and reject attempts to
+ * explicitly specify one.
*/
- if (!features_enabled(ARCH_SHSTK_SHSTK))
+ if (!features_enabled(ARCH_SHSTK_SHSTK)) {
+ if (args->shstk_token)
+ return (unsigned long)ERR_PTR(-EINVAL);
+
return 0;
+ }
+
+ /*
+ * If the user specified a shadow stack then use it, otherwise
+ * fall back to a default allocation strategy. Validation is
+ * done in arch_shstk_validate_clone().
+ */
+ if (args->shstk_token) {
+ shstk->base = 0;
+ shstk->size = 0;
+ return args->shstk_token + 8;
+ }
/*
* For CLONE_VFORK the child will share the parents shadow stack.
@@ -222,7 +265,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, u64 clone_flags,
if (!(clone_flags & CLONE_VM))
return 0;
- size = adjust_shstk_size(stack_size);
+ size = adjust_shstk_size(args->stack_size);
addr = alloc_shstk(0, size, 0, false);
if (IS_ERR_VALUE(addr))
return addr;
diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 7ee8a179d103..96cc0c7a5c90 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -124,4 +124,15 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
} while (0)
#endif
+#ifndef cmpxchg_to_user_page
+#define cmpxchg_to_user_page(vma, page, vaddr, ptr, old, new) \
+({ \
+ bool ret; \
+ \
+ ret = try_cmpxchg(ptr, &old, new); \
+ flush_icache_user_page(vma, page, vaddr, sizeof(*ptr)); \
+ ret; \
+})
+#endif
+
#endif /* _ASM_GENERIC_CACHEFLUSH_H */
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 525aa2a632b2..7f860edc58da 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -16,6 +16,7 @@ struct task_struct;
struct rusage;
union thread_union;
struct css_set;
+struct vm_area_struct;
/* All the bits taken by the old clone syscall. */
#define CLONE_LEGACY_FLAGS 0xffffffffULL
@@ -44,6 +45,7 @@ struct kernel_clone_args {
struct cgroup *cgrp;
struct css_set *cset;
unsigned int kill_seq;
+ unsigned long shstk_token;
};
/*
@@ -225,4 +227,19 @@ static inline void task_unlock(struct task_struct *p)
DEFINE_GUARD(task_lock, struct task_struct *, task_lock(_T), task_unlock(_T))
+#ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK
+int arch_shstk_validate_clone(struct task_struct *p,
+ struct vm_area_struct *vma,
+ struct page *page,
+ struct kernel_clone_args *args);
+#else
+static inline int arch_shstk_validate_clone(struct task_struct *p,
+ struct vm_area_struct *vma,
+ struct page *page,
+ struct kernel_clone_args *args)
+{
+ return 0;
+}
+#endif
+
#endif /* _LINUX_SCHED_TASK_H */
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 359a14cc76a4..7e18e7b3df3a 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -84,6 +84,7 @@
* kernel's limit of nested PID namespaces.
* @cgroup: If CLONE_INTO_CGROUP is specified set this to
* a file descriptor for the cgroup.
+ * @shstk_token: Pointer to shadow stack token at top of stack.
*
* The structure is versioned by size and thus extensible.
* New struct members must go at the end of the struct and
@@ -101,12 +102,14 @@ struct clone_args {
__aligned_u64 set_tid;
__aligned_u64 set_tid_size;
__aligned_u64 cgroup;
+ __aligned_u64 shstk_token;
};
#endif
-#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
-#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
-#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
+#define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
+#define CLONE_ARGS_SIZE_VER1 80 /* sizeof second published struct */
+#define CLONE_ARGS_SIZE_VER2 88 /* sizeof third published struct */
+#define CLONE_ARGS_SIZE_VER3 96 /* sizeof fourth published struct */
/*
* Scheduling policies
diff --git a/kernel/fork.c b/kernel/fork.c
index b1f3915d5f8e..0d171e9055d6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1955,6 +1955,51 @@ static bool need_futex_hash_allocate_default(u64 clone_flags)
return true;
}
+static int shstk_validate_clone(struct task_struct *p,
+ struct kernel_clone_args *args)
+{
+ struct mm_struct *mm;
+ struct vm_area_struct *vma;
+ struct page *page;
+ unsigned long addr;
+ int ret;
+
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_USER_SHADOW_STACK))
+ return 0;
+
+ if (!args->shstk_token)
+ return 0;
+
+ mm = get_task_mm(p);
+ if (!mm)
+ return -EFAULT;
+
+ mmap_read_lock(mm);
+
+ addr = untagged_addr_remote(mm, args->shstk_token);
+ page = get_user_page_vma_remote(mm, addr, FOLL_FORCE | FOLL_WRITE,
+ &vma);
+ if (IS_ERR(page)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ if (!(vma->vm_flags & VM_SHADOW_STACK) ||
+ !(vma->vm_flags & VM_WRITE)) {
+ ret = -EFAULT;
+ goto out_page;
+ }
+
+ ret = arch_shstk_validate_clone(p, vma, page, args);
+
+out_page:
+ put_page(page);
+out:
+ mmap_read_unlock(mm);
+ mmput(mm);
+ return ret;
+}
+
/*
* This creates a new process as a copy of the old one,
* but does not actually start it yet.
@@ -2228,6 +2273,9 @@ __latent_entropy struct task_struct *copy_process(
if (retval)
goto bad_fork_cleanup_namespaces;
retval = copy_thread(p, args);
+ if (retval)
+ goto bad_fork_cleanup_io;
+ retval = shstk_validate_clone(p, args);
if (retval)
goto bad_fork_cleanup_io;
@@ -2807,7 +2855,9 @@ static noinline int copy_clone_args_from_user(struct kernel_clone_args *kargs,
CLONE_ARGS_SIZE_VER1);
BUILD_BUG_ON(offsetofend(struct clone_args, cgroup) !=
CLONE_ARGS_SIZE_VER2);
- BUILD_BUG_ON(sizeof(struct clone_args) != CLONE_ARGS_SIZE_VER2);
+ BUILD_BUG_ON(offsetofend(struct clone_args, shstk_token) !=
+ CLONE_ARGS_SIZE_VER3);
+ BUILD_BUG_ON(sizeof(struct clone_args) != CLONE_ARGS_SIZE_VER3);
if (unlikely(usize > PAGE_SIZE))
return -E2BIG;
@@ -2840,16 +2890,17 @@ static noinline int copy_clone_args_from_user(struct kernel_clone_args *kargs,
return -EINVAL;
*kargs = (struct kernel_clone_args){
- .flags = args.flags,
- .pidfd = u64_to_user_ptr(args.pidfd),
- .child_tid = u64_to_user_ptr(args.child_tid),
- .parent_tid = u64_to_user_ptr(args.parent_tid),
- .exit_signal = args.exit_signal,
- .stack = args.stack,
- .stack_size = args.stack_size,
- .tls = args.tls,
- .set_tid_size = args.set_tid_size,
- .cgroup = args.cgroup,
+ .flags = args.flags,
+ .pidfd = u64_to_user_ptr(args.pidfd),
+ .child_tid = u64_to_user_ptr(args.child_tid),
+ .parent_tid = u64_to_user_ptr(args.parent_tid),
+ .exit_signal = args.exit_signal,
+ .stack = args.stack,
+ .stack_size = args.stack_size,
+ .tls = args.tls,
+ .set_tid_size = args.set_tid_size,
+ .cgroup = args.cgroup,
+ .shstk_token = args.shstk_token,
};
if (args.set_tid &&
@@ -2890,6 +2941,24 @@ static inline bool clone3_stack_valid(struct kernel_clone_args *kargs)
return true;
}
+/**
+ * clone3_shadow_stack_valid - check and prepare shadow stack
+ * @kargs: kernel clone args
+ *
+ * Verify that shadow stacks are only enabled if supported.
+ */
+static inline bool clone3_shadow_stack_valid(struct kernel_clone_args *kargs)
+{
+ if (!kargs->shstk_token)
+ return true;
+
+ if (!IS_ALIGNED(kargs->shstk_token, sizeof(void *)))
+ return false;
+
+ /* Fail if the kernel wasn't built with shadow stacks */
+ return IS_ENABLED(CONFIG_ARCH_HAS_USER_SHADOW_STACK);
+}
+
static bool clone3_args_valid(struct kernel_clone_args *kargs)
{
/* Verify that no unknown flags are passed along. */
@@ -2912,7 +2981,7 @@ static bool clone3_args_valid(struct kernel_clone_args *kargs)
kargs->exit_signal)
return false;
- if (!clone3_stack_valid(kargs))
+ if (!clone3_stack_valid(kargs) || !clone3_shadow_stack_valid(kargs))
return false;
return true;
--
2.47.3
^ permalink raw reply related
* [PATCH v23 3/8] selftests: Provide helper header for shadow stack testing
From: Mark Brown @ 2025-12-18 8:10 UTC (permalink / raw)
To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Christian Brauner, Shuah Khan
Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>
While almost all users of shadow stacks should be relying on the dynamic
linker and libc to enable the feature there are several low level test
programs where it is useful to enable without any libc support, allowing
testing without full system enablement. This low level testing is helpful
during bringup of the support itself, and also in enabling coverage by
automated testing without needing all system components in the target root
filesystems to have enablement.
Provide a header with helpers for this purpose, intended for use only by
test programs directly exercising shadow stack interfaces.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Kees Cook <kees@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
tools/testing/selftests/ksft_shstk.h | 98 ++++++++++++++++++++++++++++++++++++
1 file changed, 98 insertions(+)
diff --git a/tools/testing/selftests/ksft_shstk.h b/tools/testing/selftests/ksft_shstk.h
new file mode 100644
index 000000000000..fecf91218ea5
--- /dev/null
+++ b/tools/testing/selftests/ksft_shstk.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Helpers for shadow stack enablement, this is intended to only be
+ * used by low level test programs directly exercising interfaces for
+ * working with shadow stacks.
+ *
+ * Copyright (C) 2024 ARM Ltd.
+ */
+
+#ifndef __KSFT_SHSTK_H
+#define __KSFT_SHSTK_H
+
+#include <asm/mman.h>
+
+/* This is currently only defined for x86 */
+#ifndef SHADOW_STACK_SET_TOKEN
+#define SHADOW_STACK_SET_TOKEN (1ULL << 0)
+#endif
+
+static bool shadow_stack_enabled;
+
+#ifdef __x86_64__
+#define ARCH_SHSTK_ENABLE 0x5001
+#define ARCH_SHSTK_SHSTK (1ULL << 0)
+
+#define ARCH_PRCTL(arg1, arg2) \
+({ \
+ long _ret; \
+ register long _num asm("eax") = __NR_arch_prctl; \
+ register long _arg1 asm("rdi") = (long)(arg1); \
+ register long _arg2 asm("rsi") = (long)(arg2); \
+ \
+ asm volatile ( \
+ "syscall\n" \
+ : "=a"(_ret) \
+ : "r"(_arg1), "r"(_arg2), \
+ "0"(_num) \
+ : "rcx", "r11", "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#define ENABLE_SHADOW_STACK
+static __always_inline void enable_shadow_stack(void)
+{
+ int ret = ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK);
+ if (ret == 0)
+ shadow_stack_enabled = true;
+}
+
+#endif
+
+#ifdef __aarch64__
+#define PR_SET_SHADOW_STACK_STATUS 75
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+
+#define my_syscall2(num, arg1, arg2) \
+({ \
+ register long _num __asm__ ("x8") = (num); \
+ register long _arg1 __asm__ ("x0") = (long)(arg1); \
+ register long _arg2 __asm__ ("x1") = (long)(arg2); \
+ register long _arg3 __asm__ ("x2") = 0; \
+ register long _arg4 __asm__ ("x3") = 0; \
+ register long _arg5 __asm__ ("x4") = 0; \
+ \
+ __asm__ volatile ( \
+ "svc #0\n" \
+ : "=r"(_arg1) \
+ : "r"(_arg1), "r"(_arg2), \
+ "r"(_arg3), "r"(_arg4), \
+ "r"(_arg5), "r"(_num) \
+ : "memory", "cc" \
+ ); \
+ _arg1; \
+})
+
+#define ENABLE_SHADOW_STACK
+static __always_inline void enable_shadow_stack(void)
+{
+ int ret;
+
+ ret = my_syscall2(__NR_prctl, PR_SET_SHADOW_STACK_STATUS,
+ PR_SHADOW_STACK_ENABLE);
+ if (ret == 0)
+ shadow_stack_enabled = true;
+}
+
+#endif
+
+#ifndef __NR_map_shadow_stack
+#define __NR_map_shadow_stack 453
+#endif
+
+#ifndef ENABLE_SHADOW_STACK
+static inline void enable_shadow_stack(void) { }
+#endif
+
+#endif
--
2.47.3
^ permalink raw reply related
* [PATCH v23 2/8] Documentation: userspace-api: Add shadow stack API documentation
From: Mark Brown @ 2025-12-18 8:10 UTC (permalink / raw)
To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Christian Brauner, Shuah Khan
Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
Shuah Khan
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>
There are a number of architectures with shadow stack features which we are
presenting to userspace with as consistent an API as we can (though there
are some architecture specifics). Especially given that there are some
important considerations for userspace code interacting directly with the
feature let's provide some documentation covering the common aspects.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Kees Cook <kees@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 44 ++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 8a61ac4c1bf1..64b0099ee161 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -63,6 +63,7 @@ Everything else
ELF
liveupdate
netlink/index
+ shadow_stack
sysfs-platform_profile
vduse
futex2
diff --git a/Documentation/userspace-api/shadow_stack.rst b/Documentation/userspace-api/shadow_stack.rst
new file mode 100644
index 000000000000..42617d0470ba
--- /dev/null
+++ b/Documentation/userspace-api/shadow_stack.rst
@@ -0,0 +1,44 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Shadow Stacks
+=============
+
+Introduction
+============
+
+Several architectures have features which provide backward edge
+control flow protection through a hardware maintained stack, only
+writable by userspace through very limited operations. This feature
+is referred to as shadow stacks on Linux, on x86 it is part of Intel
+Control Enforcement Technology (CET), on arm64 it is Guarded Control
+Stacks feature (FEAT_GCS) and for RISC-V it is the Zicfiss extension.
+It is expected that this feature will normally be managed by the
+system dynamic linker and libc in ways broadly transparent to
+application code, this document covers interfaces and considerations.
+
+
+Enabling
+========
+
+Shadow stacks default to disabled when a userspace process is
+executed, they can be enabled for the current thread with a syscall:
+
+ - For x86 the ARCH_SHSTK_ENABLE arch_prctl()
+ - For other architectures the PR_SET_SHADOW_STACK_ENABLE prctl()
+
+It is expected that this will normally be done by the dynamic linker.
+Any new threads created by a thread with shadow stacks enabled will
+themselves have shadow stacks enabled.
+
+
+Enablement considerations
+=========================
+
+- Returning from the function that enables shadow stacks without first
+ disabling them will cause a shadow stack exception. This includes
+ any syscall wrapper or other library functions, the syscall will need
+ to be inlined.
+- A lock feature allows userspace to prevent disabling of shadow stacks.
+- Those that change the stack context like longjmp() or use of ucontext
+ changes on signal return will need support from libc.
--
2.47.3
^ permalink raw reply related
* [PATCH v23 1/8] arm64/gcs: Return a success value from gcs_alloc_thread_stack()
From: Mark Brown @ 2025-12-18 8:10 UTC (permalink / raw)
To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Christian Brauner, Shuah Khan
Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
linux-kselftest, linux-api, Mark Brown, Kees Cook
In-Reply-To: <20251218-clone3-shadow-stack-v23-0-7cb318fbb385@kernel.org>
Currently as a result of templating from x86 code gcs_alloc_thread_stack()
returns a pointer as an unsigned int however on arm64 we don't actually use
this pointer value as anything other than a pass/fail flag. Simplify the
interface to just return an int with 0 on success and a negative error code
on failure.
Acked-by: Deepak Gupta <debug@rivosinc.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
arch/arm64/include/asm/gcs.h | 8 ++++----
arch/arm64/kernel/process.c | 8 ++++----
arch/arm64/mm/gcs.c | 8 ++++----
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index 8fa0707069e8..534ea5ae9281 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -64,8 +64,8 @@ static inline bool task_gcs_el0_enabled(struct task_struct *task)
void gcs_set_el0_mode(struct task_struct *task);
void gcs_free(struct task_struct *task);
void gcs_preserve_current_state(void);
-unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
- const struct kernel_clone_args *args);
+int gcs_alloc_thread_stack(struct task_struct *tsk,
+ const struct kernel_clone_args *args);
static inline int gcs_check_locked(struct task_struct *task,
unsigned long new_val)
@@ -171,8 +171,8 @@ static inline void put_user_gcs(unsigned long val, unsigned long __user *addr,
int *err) { }
static inline void push_user_gcs(unsigned long val, int *err) { }
-static inline unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
- const struct kernel_clone_args *args)
+static inline int gcs_alloc_thread_stack(struct task_struct *tsk,
+ const struct kernel_clone_args *args)
{
return -ENOTSUPP;
}
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index fba7ca102a8c..4dadc70df16b 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -299,7 +299,7 @@ static void flush_gcs(void)
static int copy_thread_gcs(struct task_struct *p,
const struct kernel_clone_args *args)
{
- unsigned long gcs;
+ int ret;
if (!system_supports_gcs())
return 0;
@@ -310,9 +310,9 @@ static int copy_thread_gcs(struct task_struct *p,
p->thread.gcs_el0_mode = current->thread.gcs_el0_mode;
p->thread.gcs_el0_locked = current->thread.gcs_el0_locked;
- gcs = gcs_alloc_thread_stack(p, args);
- if (IS_ERR_VALUE(gcs))
- return PTR_ERR((void *)gcs);
+ ret = gcs_alloc_thread_stack(p, args);
+ if (ret != 0)
+ return ret;
return 0;
}
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 6e93f78de79b..3abcbf9adb5c 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -38,8 +38,8 @@ static unsigned long gcs_size(unsigned long size)
return max(PAGE_SIZE, size);
}
-unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
- const struct kernel_clone_args *args)
+int gcs_alloc_thread_stack(struct task_struct *tsk,
+ const struct kernel_clone_args *args)
{
unsigned long addr, size;
@@ -59,13 +59,13 @@ unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
size = gcs_size(size);
addr = alloc_gcs(0, size);
if (IS_ERR_VALUE(addr))
- return addr;
+ return PTR_ERR((void *)addr);
tsk->thread.gcs_base = addr;
tsk->thread.gcs_size = size;
tsk->thread.gcspr_el0 = addr + size - sizeof(u64);
- return addr;
+ return 0;
}
SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
--
2.47.3
^ permalink raw reply related
* [PATCH v23 0/8] fork: Support shadow stacks in clone3()
From: Mark Brown @ 2025-12-18 8:10 UTC (permalink / raw)
To: Rick P. Edgecombe, Deepak Gupta, H.J. Lu, Florian Weimer,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Christian Brauner, Shuah Khan
Cc: linux-kernel, Catalin Marinas, Will Deacon, jannh, bsegall,
Andrew Morton, Yury Khrustalev, H.J. Lu, Adhemerval Zanella Netto,
Wilco Dijkstra, CarlosO'Donell, Florian Weimer, Rich Felker,
linux-kselftest, linux-api, Mark Brown, Kees Cook, Kees Cook,
Shuah Khan
At this point I think everyone in the on the kernel side is happy with
this but there were some questions from the glibc side about the value
of controlling the shadow stack placement and size, especially with the
current inability to reuse the shadow stack for an exited thread. With
support for reuse it would be possible to have a cache of shadow stacks
as is currently supported for the normal stack.
Since the discussion petered out I'm resending this in order to give
people something work with while prototyping. It should be possible to
prototype any potential kernel features to help build out shadow stack
support in userspace by enabling shadow stack writes, as suggested by
Rick Edgecombe this may end up being required anyway for supporting more
exotic scenarios. On all current architectures with the feature writes
to shadow stack require specific instructions so there are still
security benefits even with writes enabled.
I did send a change implementing a feature writing a token on thread
exit to allow reuse:
https://lore.kernel.org/r/20250921-arm64-gcs-exit-token-v1-0-45cf64e648d5@kernel.org
but wasn't planning to refresh it without some indication from the
userspace side that that'd be useful.
Non-process cover letter:
The kernel has added support for shadow stacks, currently x86 only using
their CET feature but both arm64 and RISC-V have equivalent features
(GCS and Zicfiss respectively), I am actively working on GCS[1]. With
shadow stacks the hardware maintains an additional stack containing only
the return addresses for branch instructions which is not generally
writeable by userspace and ensures that any returns are to the recorded
addresses. This provides some protection against ROP attacks and making
it easier to collect call stacks. These shadow stacks are allocated in
the address space of the userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process, keeping the current
implicit allocation behaviour if one is not specified either with
clone3() or through the use of clone(). The user must provide a shadow
stack pointer, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with an architecture specified shadow stack
token at the top of the stack.
Yuri Khrustalev has raised questions from the libc side regarding
discoverability of extended clone3() structure sizes[2], this seems like
a general issue with clone3(). There was a suggestion to add a hwcap on
arm64 which isn't ideal but is doable there, though architecture
specific mechanisms would also be needed for x86 (and RISC-V if it's
support gets merged before this does). The idea has, however, had
strong pushback from the architecture maintainers and it is possible to
detect support for this in clone3() by attempting a call with a
misaligned shadow stack pointer specified so no hwcap has been added.
[1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org/T/#mc58f97f27461749ccf400ebabf6f9f937116a86b
[2] https://lore.kernel.org/r/aCs65ccRQtJBnZ_5@arm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
---
Changes in v23:
- Rebase onto v6.19-rc1.
- Link to v22: https://lore.kernel.org/r/20251015-clone3-shadow-stack-v22-0-a8c8da011427@kernel.org
Changes in v22:
- Rebase onto v6.18-rc1.
- Cover letter updates.
- Link to v21: https://lore.kernel.org/r/20250916-clone3-shadow-stack-v21-0-910493527013@kernel.org
Changes in v21:
- Rebase onto https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git kernel-6.18.clone3
- Rename shadow_stack_token to shstk_token, since it's a simple rename I've
kept the acks and reviews but I dropped the tested-bys just to be safe.
- Link to v20: https://lore.kernel.org/r/20250902-clone3-shadow-stack-v20-0-4d9fff1c53e7@kernel.org
Changes in v20:
- Comment fixes and clarifications in x86 arch_shstk_validate_clone()
from Rick Edgecombe.
- Spelling fix in documentation.
- Link to v19: https://lore.kernel.org/r/20250819-clone3-shadow-stack-v19-0-bc957075479b@kernel.org
Changes in v19:
- Rebase onto v6.17-rc1.
- Link to v18: https://lore.kernel.org/r/20250702-clone3-shadow-stack-v18-0-7965d2b694db@kernel.org
Changes in v18:
- Rebase onto v6.16-rc3.
- Thanks to pointers from Yuri Khrustalev this version has been tested
on x86 so I have removed the RFT tag.
- Clarify clone3_shadow_stack_valid() comment about the Kconfig check.
- Remove redundant GCSB DSYNCs in arm64 code.
- Fix token validation on x86.
- Link to v17: https://lore.kernel.org/r/20250609-clone3-shadow-stack-v17-0-8840ed97ff6f@kernel.org
Changes in v17:
- Rebase onto v6.16-rc1.
- Link to v16: https://lore.kernel.org/r/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@kernel.org
Changes in v16:
- Rebase onto v6.15-rc2.
- Roll in fixes from x86 testing from Rick Edgecombe.
- Rework so that the argument is shadow_stack_token.
- Link to v15: https://lore.kernel.org/r/20250408-clone3-shadow-stack-v15-0-3fa245c6e3be@kernel.org
Changes in v15:
- Rebase onto v6.15-rc1.
- Link to v14: https://lore.kernel.org/r/20250206-clone3-shadow-stack-v14-0-805b53af73b9@kernel.org
Changes in v14:
- Rebase onto v6.14-rc1.
- Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@kernel.org
Changes in v13:
- Rebase onto v6.13-rc1.
- Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@kernel.org
Changes in v12:
- Add the regular prctl() to the userspace API document since arm64
support is queued in -next.
- Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@kernel.org
Changes in v11:
- Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and
integrate arm64 support.
- Rework the interface to specify a shadow stack pointer rather than a
base and size like we do for the regular stack.
- Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@kernel.org
Changes in v10:
- Integrate fixes & improvements for the x86 implementation from Rick
Edgecombe.
- Require that the shadow stack be VM_WRITE.
- Require that the shadow stack base and size be sizeof(void *) aligned.
- Clean up trailing newline.
- Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@kernel.org
Changes in v9:
- Pull token validation earlier and report problems with an error return
to parent rather than signal delivery to the child.
- Verify that the top of the supplied shadow stack is VM_SHADOW_STACK.
- Rework token validation to only do the page mapping once.
- Drop no longer needed support for testing for signals in selftest.
- Fix typo in comments.
- Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@kernel.org
Changes in v8:
- Fix token verification with user specified shadow stack.
- Don't track user managed shadow stacks for child processes.
- Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@kernel.org
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@kernel.org
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@kernel.org
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@kernel.org
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@kernel.org
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@kernel.org
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@kernel.org
---
Mark Brown (8):
arm64/gcs: Return a success value from gcs_alloc_thread_stack()
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 44 +++++
arch/arm64/include/asm/gcs.h | 8 +-
arch/arm64/kernel/process.c | 8 +-
arch/arm64/mm/gcs.c | 55 +++++-
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 53 ++++-
include/asm-generic/cacheflush.h | 11 ++
include/linux/sched/task.h | 17 ++
include/uapi/linux/sched.h | 9 +-
kernel/fork.c | 93 +++++++--
tools/testing/selftests/clone3/clone3.c | 226 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 65 ++++++-
tools/testing/selftests/ksft_shstk.h | 98 ++++++++++
15 files changed, 620 insertions(+), 81 deletions(-)
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie@kernel.org>
^ permalink raw reply
* Re: [PATCH 1/6] uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
From: Christoph Hellwig @ 2025-12-18 5:17 UTC (permalink / raw)
To: Darrick J. Wong
Cc: brauner, linux-api, linux-ext4, jack, linux-xfs, linux-fsdevel,
gabriel, amir73il, linux-man
In-Reply-To: <176602332146.686273.6355079912638580915.stgit@frogsfrogsfrogs>
On Wed, Dec 17, 2025 at 06:02:56PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Stop definining these privately and instead move them to the uapi
> errno.h so that they become canonical instead of copy pasta.
Sounds fine:
Reviewed-by: Christoph Hellwig <hch@lst.de>
Do we need to document these overlay errnos in the man man pages,
though?
>
> Cc: linux-api@vger.kernel.org
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
> arch/alpha/include/uapi/asm/errno.h | 2 ++
> arch/mips/include/uapi/asm/errno.h | 2 ++
> arch/parisc/include/uapi/asm/errno.h | 2 ++
> arch/sparc/include/uapi/asm/errno.h | 2 ++
> fs/erofs/internal.h | 2 --
> fs/ext2/ext2.h | 1 -
> fs/ext4/ext4.h | 3 ---
> fs/f2fs/f2fs.h | 3 ---
> fs/minix/minix.h | 2 --
> fs/udf/udf_sb.h | 2 --
> fs/xfs/xfs_linux.h | 2 --
> include/linux/jbd2.h | 3 ---
> include/uapi/asm-generic/errno.h | 2 ++
> tools/arch/alpha/include/uapi/asm/errno.h | 2 ++
> tools/arch/mips/include/uapi/asm/errno.h | 2 ++
> tools/arch/parisc/include/uapi/asm/errno.h | 2 ++
> tools/arch/sparc/include/uapi/asm/errno.h | 2 ++
> tools/include/uapi/asm-generic/errno.h | 2 ++
> 18 files changed, 20 insertions(+), 18 deletions(-)
>
>
> diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
> index 3d265f6babaf0a..6791f6508632ee 100644
> --- a/arch/alpha/include/uapi/asm/errno.h
> +++ b/arch/alpha/include/uapi/asm/errno.h
> @@ -55,6 +55,7 @@
> #define ENOSR 82 /* Out of streams resources */
> #define ETIME 83 /* Timer expired */
> #define EBADMSG 84 /* Not a data message */
> +#define EFSBADCRC EBADMSG /* Bad CRC detected */
> #define EPROTO 85 /* Protocol error */
> #define ENODATA 86 /* No data available */
> #define ENOSTR 87 /* Device not a stream */
> @@ -96,6 +97,7 @@
> #define EREMCHG 115 /* Remote address changed */
>
> #define EUCLEAN 117 /* Structure needs cleaning */
> +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> #define ENOTNAM 118 /* Not a XENIX named type file */
> #define ENAVAIL 119 /* No XENIX semaphores available */
> #define EISNAM 120 /* Is a named type file */
> diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
> index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
> --- a/arch/mips/include/uapi/asm/errno.h
> +++ b/arch/mips/include/uapi/asm/errno.h
> @@ -50,6 +50,7 @@
> #define EDOTDOT 73 /* RFS specific error */
> #define EMULTIHOP 74 /* Multihop attempted */
> #define EBADMSG 77 /* Not a data message */
> +#define EFSBADCRC EBADMSG /* Bad CRC detected */
> #define ENAMETOOLONG 78 /* File name too long */
> #define EOVERFLOW 79 /* Value too large for defined data type */
> #define ENOTUNIQ 80 /* Name not unique on network */
> @@ -88,6 +89,7 @@
> #define EISCONN 133 /* Transport endpoint is already connected */
> #define ENOTCONN 134 /* Transport endpoint is not connected */
> #define EUCLEAN 135 /* Structure needs cleaning */
> +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> #define ENOTNAM 137 /* Not a XENIX named type file */
> #define ENAVAIL 138 /* No XENIX semaphores available */
> #define EISNAM 139 /* Is a named type file */
> diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
> index 8d94739d75c67c..8cbc07c1903e4c 100644
> --- a/arch/parisc/include/uapi/asm/errno.h
> +++ b/arch/parisc/include/uapi/asm/errno.h
> @@ -36,6 +36,7 @@
>
> #define EDOTDOT 66 /* RFS specific error */
> #define EBADMSG 67 /* Not a data message */
> +#define EFSBADCRC EBADMSG /* Bad CRC detected */
> #define EUSERS 68 /* Too many users */
> #define EDQUOT 69 /* Quota exceeded */
> #define ESTALE 70 /* Stale file handle */
> @@ -62,6 +63,7 @@
> #define ERESTART 175 /* Interrupted system call should be restarted */
> #define ESTRPIPE 176 /* Streams pipe error */
> #define EUCLEAN 177 /* Structure needs cleaning */
> +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> #define ENOTNAM 178 /* Not a XENIX named type file */
> #define ENAVAIL 179 /* No XENIX semaphores available */
> #define EISNAM 180 /* Is a named type file */
> diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
> index 81a732b902ee38..4a41e7835fd5b8 100644
> --- a/arch/sparc/include/uapi/asm/errno.h
> +++ b/arch/sparc/include/uapi/asm/errno.h
> @@ -48,6 +48,7 @@
> #define ENOSR 74 /* Out of streams resources */
> #define ENOMSG 75 /* No message of desired type */
> #define EBADMSG 76 /* Not a data message */
> +#define EFSBADCRC EBADMSG /* Bad CRC detected */
> #define EIDRM 77 /* Identifier removed */
> #define EDEADLK 78 /* Resource deadlock would occur */
> #define ENOLCK 79 /* No record locks available */
> @@ -91,6 +92,7 @@
> #define ENOTUNIQ 115 /* Name not unique on network */
> #define ERESTART 116 /* Interrupted syscall should be restarted */
> #define EUCLEAN 117 /* Structure needs cleaning */
> +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> #define ENOTNAM 118 /* Not a XENIX named type file */
> #define ENAVAIL 119 /* No XENIX semaphores available */
> #define EISNAM 120 /* Is a named type file */
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index f7f622836198da..d06e99baf5d5ae 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -541,6 +541,4 @@ long erofs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
> long erofs_compat_ioctl(struct file *filp, unsigned int cmd,
> unsigned long arg);
>
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> -
> #endif /* __EROFS_INTERNAL_H */
> diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
> index cf97b76e9fd3e9..5e0c6c5fcb6cd6 100644
> --- a/fs/ext2/ext2.h
> +++ b/fs/ext2/ext2.h
> @@ -357,7 +357,6 @@ struct ext2_inode {
> */
> #define EXT2_VALID_FS 0x0001 /* Unmounted cleanly */
> #define EXT2_ERROR_FS 0x0002 /* Errors detected */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
>
> /*
> * Mount flags
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 56112f201cace7..62c091b52bacdf 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3938,7 +3938,4 @@ extern int ext4_block_write_begin(handle_t *handle, struct folio *folio,
> get_block_t *get_block);
> #endif /* __KERNEL__ */
>
> -#define EFSBADCRC EBADMSG /* Bad CRC detected */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> -
> #endif /* _EXT4_H */
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 20edbb99b814a7..9f3aa3c7f12613 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -5004,7 +5004,4 @@ static inline void f2fs_invalidate_internal_cache(struct f2fs_sb_info *sbi,
> f2fs_invalidate_compress_pages_range(sbi, blkaddr, len);
> }
>
> -#define EFSBADCRC EBADMSG /* Bad CRC detected */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> -
> #endif /* _LINUX_F2FS_H */
> diff --git a/fs/minix/minix.h b/fs/minix/minix.h
> index 2bfaf377f2086c..7e1f652f16d311 100644
> --- a/fs/minix/minix.h
> +++ b/fs/minix/minix.h
> @@ -175,6 +175,4 @@ static inline int minix_test_bit(int nr, const void *vaddr)
> __minix_error_inode((inode), __func__, __LINE__, \
> (fmt), ##__VA_ARGS__)
>
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> -
> #endif /* FS_MINIX_H */
> diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
> index 08ec8756b9487b..8399accc788dea 100644
> --- a/fs/udf/udf_sb.h
> +++ b/fs/udf/udf_sb.h
> @@ -55,8 +55,6 @@
> #define MF_DUPLICATE_MD 0x01
> #define MF_MIRROR_FE_LOADED 0x02
>
> -#define EFSCORRUPTED EUCLEAN
> -
> struct udf_meta_data {
> __u32 s_meta_file_loc;
> __u32 s_mirror_file_loc;
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index 4dd747bdbccab2..55064228c4d574 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -121,8 +121,6 @@ typedef __u32 xfs_nlink_t;
>
> #define ENOATTR ENODATA /* Attribute not found */
> #define EWRONGFS EINVAL /* Mount with wrong filesystem type */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> -#define EFSBADCRC EBADMSG /* Bad CRC detected */
>
> #define __return_address __builtin_return_address(0)
>
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index f5eaf76198f377..a53a00d36228ce 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1815,7 +1815,4 @@ static inline int jbd2_handle_buffer_credits(handle_t *handle)
>
> #endif /* __KERNEL__ */
>
> -#define EFSBADCRC EBADMSG /* Bad CRC detected */
> -#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> -
> #endif /* _LINUX_JBD2_H */
> diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
> index cf9c51ac49f97e..92e7ae493ee315 100644
> --- a/include/uapi/asm-generic/errno.h
> +++ b/include/uapi/asm-generic/errno.h
> @@ -55,6 +55,7 @@
> #define EMULTIHOP 72 /* Multihop attempted */
> #define EDOTDOT 73 /* RFS specific error */
> #define EBADMSG 74 /* Not a data message */
> +#define EFSBADCRC EBADMSG /* Bad CRC detected */
> #define EOVERFLOW 75 /* Value too large for defined data type */
> #define ENOTUNIQ 76 /* Name not unique on network */
> #define EBADFD 77 /* File descriptor in bad state */
> @@ -98,6 +99,7 @@
> #define EINPROGRESS 115 /* Operation now in progress */
> #define ESTALE 116 /* Stale file handle */
> #define EUCLEAN 117 /* Structure needs cleaning */
> +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> #define ENOTNAM 118 /* Not a XENIX named type file */
> #define ENAVAIL 119 /* No XENIX semaphores available */
> #define EISNAM 120 /* Is a named type file */
> diff --git a/tools/arch/alpha/include/uapi/asm/errno.h b/tools/arch/alpha/include/uapi/asm/errno.h
> index 3d265f6babaf0a..6791f6508632ee 100644
> --- a/tools/arch/alpha/include/uapi/asm/errno.h
> +++ b/tools/arch/alpha/include/uapi/asm/errno.h
> @@ -55,6 +55,7 @@
> #define ENOSR 82 /* Out of streams resources */
> #define ETIME 83 /* Timer expired */
> #define EBADMSG 84 /* Not a data message */
> +#define EFSBADCRC EBADMSG /* Bad CRC detected */
> #define EPROTO 85 /* Protocol error */
> #define ENODATA 86 /* No data available */
> #define ENOSTR 87 /* Device not a stream */
> @@ -96,6 +97,7 @@
> #define EREMCHG 115 /* Remote address changed */
>
> #define EUCLEAN 117 /* Structure needs cleaning */
> +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> #define ENOTNAM 118 /* Not a XENIX named type file */
> #define ENAVAIL 119 /* No XENIX semaphores available */
> #define EISNAM 120 /* Is a named type file */
> diff --git a/tools/arch/mips/include/uapi/asm/errno.h b/tools/arch/mips/include/uapi/asm/errno.h
> index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
> --- a/tools/arch/mips/include/uapi/asm/errno.h
> +++ b/tools/arch/mips/include/uapi/asm/errno.h
> @@ -50,6 +50,7 @@
> #define EDOTDOT 73 /* RFS specific error */
> #define EMULTIHOP 74 /* Multihop attempted */
> #define EBADMSG 77 /* Not a data message */
> +#define EFSBADCRC EBADMSG /* Bad CRC detected */
> #define ENAMETOOLONG 78 /* File name too long */
> #define EOVERFLOW 79 /* Value too large for defined data type */
> #define ENOTUNIQ 80 /* Name not unique on network */
> @@ -88,6 +89,7 @@
> #define EISCONN 133 /* Transport endpoint is already connected */
> #define ENOTCONN 134 /* Transport endpoint is not connected */
> #define EUCLEAN 135 /* Structure needs cleaning */
> +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> #define ENOTNAM 137 /* Not a XENIX named type file */
> #define ENAVAIL 138 /* No XENIX semaphores available */
> #define EISNAM 139 /* Is a named type file */
> diff --git a/tools/arch/parisc/include/uapi/asm/errno.h b/tools/arch/parisc/include/uapi/asm/errno.h
> index 8d94739d75c67c..8cbc07c1903e4c 100644
> --- a/tools/arch/parisc/include/uapi/asm/errno.h
> +++ b/tools/arch/parisc/include/uapi/asm/errno.h
> @@ -36,6 +36,7 @@
>
> #define EDOTDOT 66 /* RFS specific error */
> #define EBADMSG 67 /* Not a data message */
> +#define EFSBADCRC EBADMSG /* Bad CRC detected */
> #define EUSERS 68 /* Too many users */
> #define EDQUOT 69 /* Quota exceeded */
> #define ESTALE 70 /* Stale file handle */
> @@ -62,6 +63,7 @@
> #define ERESTART 175 /* Interrupted system call should be restarted */
> #define ESTRPIPE 176 /* Streams pipe error */
> #define EUCLEAN 177 /* Structure needs cleaning */
> +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> #define ENOTNAM 178 /* Not a XENIX named type file */
> #define ENAVAIL 179 /* No XENIX semaphores available */
> #define EISNAM 180 /* Is a named type file */
> diff --git a/tools/arch/sparc/include/uapi/asm/errno.h b/tools/arch/sparc/include/uapi/asm/errno.h
> index 81a732b902ee38..4a41e7835fd5b8 100644
> --- a/tools/arch/sparc/include/uapi/asm/errno.h
> +++ b/tools/arch/sparc/include/uapi/asm/errno.h
> @@ -48,6 +48,7 @@
> #define ENOSR 74 /* Out of streams resources */
> #define ENOMSG 75 /* No message of desired type */
> #define EBADMSG 76 /* Not a data message */
> +#define EFSBADCRC EBADMSG /* Bad CRC detected */
> #define EIDRM 77 /* Identifier removed */
> #define EDEADLK 78 /* Resource deadlock would occur */
> #define ENOLCK 79 /* No record locks available */
> @@ -91,6 +92,7 @@
> #define ENOTUNIQ 115 /* Name not unique on network */
> #define ERESTART 116 /* Interrupted syscall should be restarted */
> #define EUCLEAN 117 /* Structure needs cleaning */
> +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> #define ENOTNAM 118 /* Not a XENIX named type file */
> #define ENAVAIL 119 /* No XENIX semaphores available */
> #define EISNAM 120 /* Is a named type file */
> diff --git a/tools/include/uapi/asm-generic/errno.h b/tools/include/uapi/asm-generic/errno.h
> index cf9c51ac49f97e..92e7ae493ee315 100644
> --- a/tools/include/uapi/asm-generic/errno.h
> +++ b/tools/include/uapi/asm-generic/errno.h
> @@ -55,6 +55,7 @@
> #define EMULTIHOP 72 /* Multihop attempted */
> #define EDOTDOT 73 /* RFS specific error */
> #define EBADMSG 74 /* Not a data message */
> +#define EFSBADCRC EBADMSG /* Bad CRC detected */
> #define EOVERFLOW 75 /* Value too large for defined data type */
> #define ENOTUNIQ 76 /* Name not unique on network */
> #define EBADFD 77 /* File descriptor in bad state */
> @@ -98,6 +99,7 @@
> #define EINPROGRESS 115 /* Operation now in progress */
> #define ESTALE 116 /* Stale file handle */
> #define EUCLEAN 117 /* Structure needs cleaning */
> +#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> #define ENOTNAM 118 /* Not a XENIX named type file */
> #define ENAVAIL 119 /* No XENIX semaphores available */
> #define EISNAM 120 /* Is a named type file */
>
>
---end quoted text---
^ permalink raw reply
* [PATCH 1/6] uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
From: Darrick J. Wong @ 2025-12-18 2:02 UTC (permalink / raw)
To: brauner, djwong
Cc: linux-api, linux-ext4, jack, linux-xfs, linux-fsdevel, gabriel,
hch, amir73il
In-Reply-To: <176602332085.686273.7564676516217176769.stgit@frogsfrogsfrogs>
From: Darrick J. Wong <djwong@kernel.org>
Stop definining these privately and instead move them to the uapi
errno.h so that they become canonical instead of copy pasta.
Cc: linux-api@vger.kernel.org
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
arch/alpha/include/uapi/asm/errno.h | 2 ++
arch/mips/include/uapi/asm/errno.h | 2 ++
arch/parisc/include/uapi/asm/errno.h | 2 ++
arch/sparc/include/uapi/asm/errno.h | 2 ++
fs/erofs/internal.h | 2 --
fs/ext2/ext2.h | 1 -
fs/ext4/ext4.h | 3 ---
fs/f2fs/f2fs.h | 3 ---
fs/minix/minix.h | 2 --
fs/udf/udf_sb.h | 2 --
fs/xfs/xfs_linux.h | 2 --
include/linux/jbd2.h | 3 ---
include/uapi/asm-generic/errno.h | 2 ++
tools/arch/alpha/include/uapi/asm/errno.h | 2 ++
tools/arch/mips/include/uapi/asm/errno.h | 2 ++
tools/arch/parisc/include/uapi/asm/errno.h | 2 ++
tools/arch/sparc/include/uapi/asm/errno.h | 2 ++
tools/include/uapi/asm-generic/errno.h | 2 ++
18 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/errno.h b/arch/alpha/include/uapi/asm/errno.h
index 3d265f6babaf0a..6791f6508632ee 100644
--- a/arch/alpha/include/uapi/asm/errno.h
+++ b/arch/alpha/include/uapi/asm/errno.h
@@ -55,6 +55,7 @@
#define ENOSR 82 /* Out of streams resources */
#define ETIME 83 /* Timer expired */
#define EBADMSG 84 /* Not a data message */
+#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EPROTO 85 /* Protocol error */
#define ENODATA 86 /* No data available */
#define ENOSTR 87 /* Device not a stream */
@@ -96,6 +97,7 @@
#define EREMCHG 115 /* Remote address changed */
#define EUCLEAN 117 /* Structure needs cleaning */
+#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#define ENOTNAM 118 /* Not a XENIX named type file */
#define ENAVAIL 119 /* No XENIX semaphores available */
#define EISNAM 120 /* Is a named type file */
diff --git a/arch/mips/include/uapi/asm/errno.h b/arch/mips/include/uapi/asm/errno.h
index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
--- a/arch/mips/include/uapi/asm/errno.h
+++ b/arch/mips/include/uapi/asm/errno.h
@@ -50,6 +50,7 @@
#define EDOTDOT 73 /* RFS specific error */
#define EMULTIHOP 74 /* Multihop attempted */
#define EBADMSG 77 /* Not a data message */
+#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define ENAMETOOLONG 78 /* File name too long */
#define EOVERFLOW 79 /* Value too large for defined data type */
#define ENOTUNIQ 80 /* Name not unique on network */
@@ -88,6 +89,7 @@
#define EISCONN 133 /* Transport endpoint is already connected */
#define ENOTCONN 134 /* Transport endpoint is not connected */
#define EUCLEAN 135 /* Structure needs cleaning */
+#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#define ENOTNAM 137 /* Not a XENIX named type file */
#define ENAVAIL 138 /* No XENIX semaphores available */
#define EISNAM 139 /* Is a named type file */
diff --git a/arch/parisc/include/uapi/asm/errno.h b/arch/parisc/include/uapi/asm/errno.h
index 8d94739d75c67c..8cbc07c1903e4c 100644
--- a/arch/parisc/include/uapi/asm/errno.h
+++ b/arch/parisc/include/uapi/asm/errno.h
@@ -36,6 +36,7 @@
#define EDOTDOT 66 /* RFS specific error */
#define EBADMSG 67 /* Not a data message */
+#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EUSERS 68 /* Too many users */
#define EDQUOT 69 /* Quota exceeded */
#define ESTALE 70 /* Stale file handle */
@@ -62,6 +63,7 @@
#define ERESTART 175 /* Interrupted system call should be restarted */
#define ESTRPIPE 176 /* Streams pipe error */
#define EUCLEAN 177 /* Structure needs cleaning */
+#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#define ENOTNAM 178 /* Not a XENIX named type file */
#define ENAVAIL 179 /* No XENIX semaphores available */
#define EISNAM 180 /* Is a named type file */
diff --git a/arch/sparc/include/uapi/asm/errno.h b/arch/sparc/include/uapi/asm/errno.h
index 81a732b902ee38..4a41e7835fd5b8 100644
--- a/arch/sparc/include/uapi/asm/errno.h
+++ b/arch/sparc/include/uapi/asm/errno.h
@@ -48,6 +48,7 @@
#define ENOSR 74 /* Out of streams resources */
#define ENOMSG 75 /* No message of desired type */
#define EBADMSG 76 /* Not a data message */
+#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EIDRM 77 /* Identifier removed */
#define EDEADLK 78 /* Resource deadlock would occur */
#define ENOLCK 79 /* No record locks available */
@@ -91,6 +92,7 @@
#define ENOTUNIQ 115 /* Name not unique on network */
#define ERESTART 116 /* Interrupted syscall should be restarted */
#define EUCLEAN 117 /* Structure needs cleaning */
+#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#define ENOTNAM 118 /* Not a XENIX named type file */
#define ENAVAIL 119 /* No XENIX semaphores available */
#define EISNAM 120 /* Is a named type file */
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index f7f622836198da..d06e99baf5d5ae 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -541,6 +541,4 @@ long erofs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
long erofs_compat_ioctl(struct file *filp, unsigned int cmd,
unsigned long arg);
-#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
-
#endif /* __EROFS_INTERNAL_H */
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index cf97b76e9fd3e9..5e0c6c5fcb6cd6 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -357,7 +357,6 @@ struct ext2_inode {
*/
#define EXT2_VALID_FS 0x0001 /* Unmounted cleanly */
#define EXT2_ERROR_FS 0x0002 /* Errors detected */
-#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
/*
* Mount flags
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 56112f201cace7..62c091b52bacdf 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3938,7 +3938,4 @@ extern int ext4_block_write_begin(handle_t *handle, struct folio *folio,
get_block_t *get_block);
#endif /* __KERNEL__ */
-#define EFSBADCRC EBADMSG /* Bad CRC detected */
-#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
-
#endif /* _EXT4_H */
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 20edbb99b814a7..9f3aa3c7f12613 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -5004,7 +5004,4 @@ static inline void f2fs_invalidate_internal_cache(struct f2fs_sb_info *sbi,
f2fs_invalidate_compress_pages_range(sbi, blkaddr, len);
}
-#define EFSBADCRC EBADMSG /* Bad CRC detected */
-#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
-
#endif /* _LINUX_F2FS_H */
diff --git a/fs/minix/minix.h b/fs/minix/minix.h
index 2bfaf377f2086c..7e1f652f16d311 100644
--- a/fs/minix/minix.h
+++ b/fs/minix/minix.h
@@ -175,6 +175,4 @@ static inline int minix_test_bit(int nr, const void *vaddr)
__minix_error_inode((inode), __func__, __LINE__, \
(fmt), ##__VA_ARGS__)
-#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
-
#endif /* FS_MINIX_H */
diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
index 08ec8756b9487b..8399accc788dea 100644
--- a/fs/udf/udf_sb.h
+++ b/fs/udf/udf_sb.h
@@ -55,8 +55,6 @@
#define MF_DUPLICATE_MD 0x01
#define MF_MIRROR_FE_LOADED 0x02
-#define EFSCORRUPTED EUCLEAN
-
struct udf_meta_data {
__u32 s_meta_file_loc;
__u32 s_mirror_file_loc;
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 4dd747bdbccab2..55064228c4d574 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -121,8 +121,6 @@ typedef __u32 xfs_nlink_t;
#define ENOATTR ENODATA /* Attribute not found */
#define EWRONGFS EINVAL /* Mount with wrong filesystem type */
-#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
-#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define __return_address __builtin_return_address(0)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f5eaf76198f377..a53a00d36228ce 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1815,7 +1815,4 @@ static inline int jbd2_handle_buffer_credits(handle_t *handle)
#endif /* __KERNEL__ */
-#define EFSBADCRC EBADMSG /* Bad CRC detected */
-#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
-
#endif /* _LINUX_JBD2_H */
diff --git a/include/uapi/asm-generic/errno.h b/include/uapi/asm-generic/errno.h
index cf9c51ac49f97e..92e7ae493ee315 100644
--- a/include/uapi/asm-generic/errno.h
+++ b/include/uapi/asm-generic/errno.h
@@ -55,6 +55,7 @@
#define EMULTIHOP 72 /* Multihop attempted */
#define EDOTDOT 73 /* RFS specific error */
#define EBADMSG 74 /* Not a data message */
+#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EOVERFLOW 75 /* Value too large for defined data type */
#define ENOTUNIQ 76 /* Name not unique on network */
#define EBADFD 77 /* File descriptor in bad state */
@@ -98,6 +99,7 @@
#define EINPROGRESS 115 /* Operation now in progress */
#define ESTALE 116 /* Stale file handle */
#define EUCLEAN 117 /* Structure needs cleaning */
+#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#define ENOTNAM 118 /* Not a XENIX named type file */
#define ENAVAIL 119 /* No XENIX semaphores available */
#define EISNAM 120 /* Is a named type file */
diff --git a/tools/arch/alpha/include/uapi/asm/errno.h b/tools/arch/alpha/include/uapi/asm/errno.h
index 3d265f6babaf0a..6791f6508632ee 100644
--- a/tools/arch/alpha/include/uapi/asm/errno.h
+++ b/tools/arch/alpha/include/uapi/asm/errno.h
@@ -55,6 +55,7 @@
#define ENOSR 82 /* Out of streams resources */
#define ETIME 83 /* Timer expired */
#define EBADMSG 84 /* Not a data message */
+#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EPROTO 85 /* Protocol error */
#define ENODATA 86 /* No data available */
#define ENOSTR 87 /* Device not a stream */
@@ -96,6 +97,7 @@
#define EREMCHG 115 /* Remote address changed */
#define EUCLEAN 117 /* Structure needs cleaning */
+#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#define ENOTNAM 118 /* Not a XENIX named type file */
#define ENAVAIL 119 /* No XENIX semaphores available */
#define EISNAM 120 /* Is a named type file */
diff --git a/tools/arch/mips/include/uapi/asm/errno.h b/tools/arch/mips/include/uapi/asm/errno.h
index 2fb714e2d6d8fc..c01ed91b1ef44b 100644
--- a/tools/arch/mips/include/uapi/asm/errno.h
+++ b/tools/arch/mips/include/uapi/asm/errno.h
@@ -50,6 +50,7 @@
#define EDOTDOT 73 /* RFS specific error */
#define EMULTIHOP 74 /* Multihop attempted */
#define EBADMSG 77 /* Not a data message */
+#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define ENAMETOOLONG 78 /* File name too long */
#define EOVERFLOW 79 /* Value too large for defined data type */
#define ENOTUNIQ 80 /* Name not unique on network */
@@ -88,6 +89,7 @@
#define EISCONN 133 /* Transport endpoint is already connected */
#define ENOTCONN 134 /* Transport endpoint is not connected */
#define EUCLEAN 135 /* Structure needs cleaning */
+#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#define ENOTNAM 137 /* Not a XENIX named type file */
#define ENAVAIL 138 /* No XENIX semaphores available */
#define EISNAM 139 /* Is a named type file */
diff --git a/tools/arch/parisc/include/uapi/asm/errno.h b/tools/arch/parisc/include/uapi/asm/errno.h
index 8d94739d75c67c..8cbc07c1903e4c 100644
--- a/tools/arch/parisc/include/uapi/asm/errno.h
+++ b/tools/arch/parisc/include/uapi/asm/errno.h
@@ -36,6 +36,7 @@
#define EDOTDOT 66 /* RFS specific error */
#define EBADMSG 67 /* Not a data message */
+#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EUSERS 68 /* Too many users */
#define EDQUOT 69 /* Quota exceeded */
#define ESTALE 70 /* Stale file handle */
@@ -62,6 +63,7 @@
#define ERESTART 175 /* Interrupted system call should be restarted */
#define ESTRPIPE 176 /* Streams pipe error */
#define EUCLEAN 177 /* Structure needs cleaning */
+#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#define ENOTNAM 178 /* Not a XENIX named type file */
#define ENAVAIL 179 /* No XENIX semaphores available */
#define EISNAM 180 /* Is a named type file */
diff --git a/tools/arch/sparc/include/uapi/asm/errno.h b/tools/arch/sparc/include/uapi/asm/errno.h
index 81a732b902ee38..4a41e7835fd5b8 100644
--- a/tools/arch/sparc/include/uapi/asm/errno.h
+++ b/tools/arch/sparc/include/uapi/asm/errno.h
@@ -48,6 +48,7 @@
#define ENOSR 74 /* Out of streams resources */
#define ENOMSG 75 /* No message of desired type */
#define EBADMSG 76 /* Not a data message */
+#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EIDRM 77 /* Identifier removed */
#define EDEADLK 78 /* Resource deadlock would occur */
#define ENOLCK 79 /* No record locks available */
@@ -91,6 +92,7 @@
#define ENOTUNIQ 115 /* Name not unique on network */
#define ERESTART 116 /* Interrupted syscall should be restarted */
#define EUCLEAN 117 /* Structure needs cleaning */
+#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#define ENOTNAM 118 /* Not a XENIX named type file */
#define ENAVAIL 119 /* No XENIX semaphores available */
#define EISNAM 120 /* Is a named type file */
diff --git a/tools/include/uapi/asm-generic/errno.h b/tools/include/uapi/asm-generic/errno.h
index cf9c51ac49f97e..92e7ae493ee315 100644
--- a/tools/include/uapi/asm-generic/errno.h
+++ b/tools/include/uapi/asm-generic/errno.h
@@ -55,6 +55,7 @@
#define EMULTIHOP 72 /* Multihop attempted */
#define EDOTDOT 73 /* RFS specific error */
#define EBADMSG 74 /* Not a data message */
+#define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EOVERFLOW 75 /* Value too large for defined data type */
#define ENOTUNIQ 76 /* Name not unique on network */
#define EBADFD 77 /* File descriptor in bad state */
@@ -98,6 +99,7 @@
#define EINPROGRESS 115 /* Operation now in progress */
#define ESTALE 116 /* Stale file handle */
#define EUCLEAN 117 /* Structure needs cleaning */
+#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#define ENOTNAM 118 /* Not a XENIX named type file */
#define ENAVAIL 119 /* No XENIX semaphores available */
#define EISNAM 120 /* Is a named type file */
^ permalink raw reply related
* [PATCHSET V4 1/2] fs: generic file IO error reporting
From: Darrick J. Wong @ 2025-12-18 2:02 UTC (permalink / raw)
To: brauner, djwong
Cc: linux-api, hch, linux-ext4, jack, linux-xfs, linux-fsdevel,
gabriel, hch, amir73il
Hi all,
This patchset adds some generic helpers so that filesystems can report
errors to fsnotify in a standard way. Then it adapts iomap to use the
generic helpers so that any iomap-enabled filesystem can report I/O
errors through this mechanism as well. Finally, it makes XFS report
metadata errors through this mechanism in much the same way that ext4
does now.
These are a prerequisite for the XFS self-healing V4 series which will
come at a later time.
If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.
This has been running on the djcloud for months with no problems. Enjoy!
Comments and questions are, as always, welcome.
--D
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=filesystem-error-reporting
fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=filesystem-error-reporting
---
Commits in this patchset:
* uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
* fs: report filesystem and file I/O errors to fsnotify
* iomap: report file I/O errors to the VFS
* xfs: report fs metadata errors via fsnotify
* xfs: translate fsdax media errors into file "data lost" errors when convenient
* ext4: convert to new fserror helpers
---
arch/alpha/include/uapi/asm/errno.h | 2
arch/mips/include/uapi/asm/errno.h | 2
arch/parisc/include/uapi/asm/errno.h | 2
arch/sparc/include/uapi/asm/errno.h | 2
fs/erofs/internal.h | 2
fs/ext2/ext2.h | 1
fs/ext4/ext4.h | 3 -
fs/f2fs/f2fs.h | 3 -
fs/minix/minix.h | 2
fs/udf/udf_sb.h | 2
fs/xfs/xfs_linux.h | 2
include/linux/fs/super_types.h | 7 +
include/linux/fserror.h | 93 ++++++++++++++++
include/linux/jbd2.h | 3 -
include/uapi/asm-generic/errno.h | 2
tools/arch/alpha/include/uapi/asm/errno.h | 2
tools/arch/mips/include/uapi/asm/errno.h | 2
tools/arch/parisc/include/uapi/asm/errno.h | 2
tools/arch/sparc/include/uapi/asm/errno.h | 2
tools/include/uapi/asm-generic/errno.h | 2
fs/Makefile | 2
fs/ext4/ioctl.c | 2
fs/ext4/super.c | 13 ++
fs/fserror.c | 168 ++++++++++++++++++++++++++++
fs/iomap/buffered-io.c | 23 ++++
fs/iomap/direct-io.c | 12 ++
fs/iomap/ioend.c | 6 +
fs/super.c | 3 +
fs/xfs/xfs_fsops.c | 4 +
fs/xfs/xfs_health.c | 14 ++
fs/xfs/xfs_notify_failure.c | 4 +
31 files changed, 365 insertions(+), 24 deletions(-)
create mode 100644 include/linux/fserror.h
create mode 100644 fs/fserror.c
^ permalink raw reply
* Re: [PATCH v4 1/3] init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
From: Askar Safin @ 2025-12-15 17:59 UTC (permalink / raw)
To: rdunlap
Cc: initramfs, linux-api, linux-arch, linux-block, linux-doc,
linux-fsdevel, linux-kernel, patches
In-Reply-To: <5c3c4233-3572-4842-850e-0a88ce16eee3@infradead.org>
Randy Dunlap <rdunlap@infradead.org>:
> Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
>
> Thanks.
Thank you!
P. S. For unknown reasons I don't see your email in my Gmail. Not even in
spam folder.
--
Askar Safin
^ permalink raw reply
* Re: [PATCH v2] usb: gadget: f_midi: allow customizing the USB MIDI interface string through configfs
From: Takashi Iwai @ 2025-12-10 13:26 UTC (permalink / raw)
To: Victor Krawiec
Cc: gregkh, tiwai, corbet, jilliandonahue58, selvarasu.g, jkeeping,
linux-kernel, linux-usb, linux-doc, linux-api
In-Reply-To: <20251209164006.143219-1-victor.krawiec@arturia.com>
On Tue, 09 Dec 2025 17:40:06 +0100,
Victor Krawiec wrote:
>
> When using f_midi from configfs the USB MIDI interface string is hardcoded
> to 'MIDI function'.
>
> This USB string descriptor is used by some third-party OS or software to
> display the name of the MIDI device
>
> Since we add an additional string option a new macro block was created to
> factorize declarations
>
> Signed-off-by: Victor Krawiec <victor.krawiec@arturia.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
thanks,
Takashi
> ---
> V1 -> V2:
> - Add documentation
> - Cleanup unnecessary *_allocated boolean as requested in review
>
> .../ABI/testing/configfs-usb-gadget-midi | 17 +--
> Documentation/usb/gadget-testing.rst | 17 +--
> drivers/usb/gadget/function/f_midi.c | 110 ++++++++++--------
> drivers/usb/gadget/function/u_midi.h | 2 +-
> 4 files changed, 78 insertions(+), 68 deletions(-)
>
> diff --git a/Documentation/ABI/testing/configfs-usb-gadget-midi b/Documentation/ABI/testing/configfs-usb-gadget-midi
> index 07389cddd51a..d6bd67bb91fc 100644
> --- a/Documentation/ABI/testing/configfs-usb-gadget-midi
> +++ b/Documentation/ABI/testing/configfs-usb-gadget-midi
> @@ -4,11 +4,12 @@ KernelVersion: 3.19
> Description:
> The attributes:
>
> - ========== ====================================
> - index index value for the USB MIDI adapter
> - id ID string for the USB MIDI adapter
> - buflen MIDI buffer length
> - qlen USB read request queue length
> - in_ports number of MIDI input ports
> - out_ports number of MIDI output ports
> - ========== ====================================
> + ================ ====================================
> + index index value for the USB MIDI adapter
> + id ID string for the USB MIDI adapter
> + buflen MIDI buffer length
> + qlen USB read request queue length
> + in_ports number of MIDI input ports
> + out_ports number of MIDI output ports
> + interface_string USB AudioControl interface string
> + ================ ====================================
> diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst
> index 5f90af1fb573..01a128d664cb 100644
> --- a/Documentation/usb/gadget-testing.rst
> +++ b/Documentation/usb/gadget-testing.rst
> @@ -368,14 +368,15 @@ Function-specific configfs interface
> The function name to use when creating the function directory is "midi".
> The MIDI function provides these attributes in its function directory:
>
> - =============== ====================================
> - buflen MIDI buffer length
> - id ID string for the USB MIDI adapter
> - in_ports number of MIDI input ports
> - index index value for the USB MIDI adapter
> - out_ports number of MIDI output ports
> - qlen USB read request queue length
> - =============== ====================================
> + ================ ====================================
> + buflen MIDI buffer length
> + id ID string for the USB MIDI adapter
> + in_ports number of MIDI input ports
> + index index value for the USB MIDI adapter
> + out_ports number of MIDI output ports
> + qlen USB read request queue length
> + interface_string USB AudioControl interface string
> + ================ ====================================
>
> Testing the MIDI function
> -------------------------
> diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c
> index da82598fcef8..ad679a6ecac1 100644
> --- a/drivers/usb/gadget/function/f_midi.c
> +++ b/drivers/usb/gadget/function/f_midi.c
> @@ -875,6 +875,7 @@ static int f_midi_bind(struct usb_configuration *c, struct usb_function *f)
> struct usb_composite_dev *cdev = c->cdev;
> struct f_midi *midi = func_to_midi(f);
> struct usb_string *us;
> + struct f_midi_opts *opts;
> int status, n, jack = 1, i = 0, endpoint_descriptor_index = 0;
>
> midi->gadget = cdev->gadget;
> @@ -883,6 +884,10 @@ static int f_midi_bind(struct usb_configuration *c, struct usb_function *f)
> if (status < 0)
> goto fail_register;
>
> + opts = container_of(f->fi, struct f_midi_opts, func_inst);
> + if (opts->interface_string)
> + midi_string_defs[STRING_FUNC_IDX].s = opts->interface_string;
> +
> /* maybe allocate device-global string ID */
> us = usb_gstrings_attach(c->cdev, midi_strings,
> ARRAY_SIZE(midi_string_defs));
> @@ -1178,59 +1183,60 @@ end: \
> \
> CONFIGFS_ATTR(f_midi_opts_, name);
>
> +#define F_MIDI_OPT_STRING(name) \
> +static ssize_t f_midi_opts_##name##_show(struct config_item *item, char *page) \
> +{ \
> + struct f_midi_opts *opts = to_f_midi_opts(item); \
> + ssize_t result; \
> + \
> + mutex_lock(&opts->lock); \
> + if (opts->name) { \
> + result = strscpy(page, opts->name, PAGE_SIZE); \
> + } else { \
> + page[0] = 0; \
> + result = 0; \
> + } \
> + \
> + mutex_unlock(&opts->lock); \
> + \
> + return result; \
> +} \
> + \
> +static ssize_t f_midi_opts_##name##_store(struct config_item *item, \
> + const char *page, size_t len) \
> +{ \
> + struct f_midi_opts *opts = to_f_midi_opts(item); \
> + int ret; \
> + char *c; \
> + \
> + mutex_lock(&opts->lock); \
> + if (opts->refcnt > 1) { \
> + ret = -EBUSY; \
> + goto end; \
> + } \
> + \
> + c = kstrndup(page, len, GFP_KERNEL); \
> + if (!c) { \
> + ret = -ENOMEM; \
> + goto end; \
> + } \
> + kfree(opts->name); \
> + opts->name = c; \
> + ret = len; \
> +end: \
> + mutex_unlock(&opts->lock); \
> + return ret; \
> +} \
> + \
> +CONFIGFS_ATTR(f_midi_opts_, name)
> +
> F_MIDI_OPT_SIGNED(index, true, SNDRV_CARDS);
> F_MIDI_OPT(buflen, false, 0);
> F_MIDI_OPT(qlen, false, 0);
> F_MIDI_OPT(in_ports, true, MAX_PORTS);
> F_MIDI_OPT(out_ports, true, MAX_PORTS);
> -
> -static ssize_t f_midi_opts_id_show(struct config_item *item, char *page)
> -{
> - struct f_midi_opts *opts = to_f_midi_opts(item);
> - ssize_t result;
> -
> - mutex_lock(&opts->lock);
> - if (opts->id) {
> - result = strscpy(page, opts->id, PAGE_SIZE);
> - } else {
> - page[0] = 0;
> - result = 0;
> - }
> -
> - mutex_unlock(&opts->lock);
> -
> - return result;
> -}
> -
> -static ssize_t f_midi_opts_id_store(struct config_item *item,
> - const char *page, size_t len)
> -{
> - struct f_midi_opts *opts = to_f_midi_opts(item);
> - int ret;
> - char *c;
> -
> - mutex_lock(&opts->lock);
> - if (opts->refcnt > 1) {
> - ret = -EBUSY;
> - goto end;
> - }
> -
> - c = kstrndup(page, len, GFP_KERNEL);
> - if (!c) {
> - ret = -ENOMEM;
> - goto end;
> - }
> - if (opts->id_allocated)
> - kfree(opts->id);
> - opts->id = c;
> - opts->id_allocated = true;
> - ret = len;
> -end:
> - mutex_unlock(&opts->lock);
> - return ret;
> -}
> -
> -CONFIGFS_ATTR(f_midi_opts_, id);
> +F_MIDI_OPT_STRING(id);
> +F_MIDI_OPT_STRING(interface_string);
>
> static struct configfs_attribute *midi_attrs[] = {
> &f_midi_opts_attr_index,
> @@ -1239,6 +1245,7 @@ static struct configfs_attribute *midi_attrs[] = {
> &f_midi_opts_attr_in_ports,
> &f_midi_opts_attr_out_ports,
> &f_midi_opts_attr_id,
> + &f_midi_opts_attr_interface_string,
> NULL,
> };
>
> @@ -1262,8 +1269,8 @@ static void f_midi_free_inst(struct usb_function_instance *f)
> mutex_unlock(&opts->lock);
>
> if (free) {
> - if (opts->id_allocated)
> - kfree(opts->id);
> + kfree(opts->id);
> + kfree(opts->interface_string);
> kfree(opts);
> }
> }
> @@ -1279,7 +1286,8 @@ static struct usb_function_instance *f_midi_alloc_inst(void)
> mutex_init(&opts->lock);
> opts->func_inst.free_func_inst = f_midi_free_inst;
> opts->index = SNDRV_DEFAULT_IDX1;
> - opts->id = SNDRV_DEFAULT_STR1;
> + opts->id = NULL;
> + opts->interface_string = NULL;
> opts->buflen = 512;
> opts->qlen = 32;
> opts->in_ports = 1;
> diff --git a/drivers/usb/gadget/function/u_midi.h b/drivers/usb/gadget/function/u_midi.h
> index 2e400b495cb8..41cb8aa73f09 100644
> --- a/drivers/usb/gadget/function/u_midi.h
> +++ b/drivers/usb/gadget/function/u_midi.h
> @@ -19,7 +19,7 @@ struct f_midi_opts {
> struct usb_function_instance func_inst;
> int index;
> char *id;
> - bool id_allocated;
> + char *interface_string;
> unsigned int in_ports;
> unsigned int out_ports;
> unsigned int buflen;
>
> base-commit: 67a454e6b1c604555c04501c77b7fedc5d98a779
> --
> 2.43.0
>
^ permalink raw reply
* [PATCH v2] usb: gadget: f_midi: allow customizing the USB MIDI interface string through configfs
From: Victor Krawiec @ 2025-12-09 16:40 UTC (permalink / raw)
To: gregkh
Cc: tiwai, corbet, jilliandonahue58, selvarasu.g, jkeeping,
linux-kernel, linux-usb, linux-doc, linux-api, Victor Krawiec
When using f_midi from configfs the USB MIDI interface string is hardcoded
to 'MIDI function'.
This USB string descriptor is used by some third-party OS or software to
display the name of the MIDI device
Since we add an additional string option a new macro block was created to
factorize declarations
Signed-off-by: Victor Krawiec <victor.krawiec@arturia.com>
---
V1 -> V2:
- Add documentation
- Cleanup unnecessary *_allocated boolean as requested in review
.../ABI/testing/configfs-usb-gadget-midi | 17 +--
Documentation/usb/gadget-testing.rst | 17 +--
drivers/usb/gadget/function/f_midi.c | 110 ++++++++++--------
drivers/usb/gadget/function/u_midi.h | 2 +-
4 files changed, 78 insertions(+), 68 deletions(-)
diff --git a/Documentation/ABI/testing/configfs-usb-gadget-midi b/Documentation/ABI/testing/configfs-usb-gadget-midi
index 07389cddd51a..d6bd67bb91fc 100644
--- a/Documentation/ABI/testing/configfs-usb-gadget-midi
+++ b/Documentation/ABI/testing/configfs-usb-gadget-midi
@@ -4,11 +4,12 @@ KernelVersion: 3.19
Description:
The attributes:
- ========== ====================================
- index index value for the USB MIDI adapter
- id ID string for the USB MIDI adapter
- buflen MIDI buffer length
- qlen USB read request queue length
- in_ports number of MIDI input ports
- out_ports number of MIDI output ports
- ========== ====================================
+ ================ ====================================
+ index index value for the USB MIDI adapter
+ id ID string for the USB MIDI adapter
+ buflen MIDI buffer length
+ qlen USB read request queue length
+ in_ports number of MIDI input ports
+ out_ports number of MIDI output ports
+ interface_string USB AudioControl interface string
+ ================ ====================================
diff --git a/Documentation/usb/gadget-testing.rst b/Documentation/usb/gadget-testing.rst
index 5f90af1fb573..01a128d664cb 100644
--- a/Documentation/usb/gadget-testing.rst
+++ b/Documentation/usb/gadget-testing.rst
@@ -368,14 +368,15 @@ Function-specific configfs interface
The function name to use when creating the function directory is "midi".
The MIDI function provides these attributes in its function directory:
- =============== ====================================
- buflen MIDI buffer length
- id ID string for the USB MIDI adapter
- in_ports number of MIDI input ports
- index index value for the USB MIDI adapter
- out_ports number of MIDI output ports
- qlen USB read request queue length
- =============== ====================================
+ ================ ====================================
+ buflen MIDI buffer length
+ id ID string for the USB MIDI adapter
+ in_ports number of MIDI input ports
+ index index value for the USB MIDI adapter
+ out_ports number of MIDI output ports
+ qlen USB read request queue length
+ interface_string USB AudioControl interface string
+ ================ ====================================
Testing the MIDI function
-------------------------
diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c
index da82598fcef8..ad679a6ecac1 100644
--- a/drivers/usb/gadget/function/f_midi.c
+++ b/drivers/usb/gadget/function/f_midi.c
@@ -875,6 +875,7 @@ static int f_midi_bind(struct usb_configuration *c, struct usb_function *f)
struct usb_composite_dev *cdev = c->cdev;
struct f_midi *midi = func_to_midi(f);
struct usb_string *us;
+ struct f_midi_opts *opts;
int status, n, jack = 1, i = 0, endpoint_descriptor_index = 0;
midi->gadget = cdev->gadget;
@@ -883,6 +884,10 @@ static int f_midi_bind(struct usb_configuration *c, struct usb_function *f)
if (status < 0)
goto fail_register;
+ opts = container_of(f->fi, struct f_midi_opts, func_inst);
+ if (opts->interface_string)
+ midi_string_defs[STRING_FUNC_IDX].s = opts->interface_string;
+
/* maybe allocate device-global string ID */
us = usb_gstrings_attach(c->cdev, midi_strings,
ARRAY_SIZE(midi_string_defs));
@@ -1178,59 +1183,60 @@ end: \
\
CONFIGFS_ATTR(f_midi_opts_, name);
+#define F_MIDI_OPT_STRING(name) \
+static ssize_t f_midi_opts_##name##_show(struct config_item *item, char *page) \
+{ \
+ struct f_midi_opts *opts = to_f_midi_opts(item); \
+ ssize_t result; \
+ \
+ mutex_lock(&opts->lock); \
+ if (opts->name) { \
+ result = strscpy(page, opts->name, PAGE_SIZE); \
+ } else { \
+ page[0] = 0; \
+ result = 0; \
+ } \
+ \
+ mutex_unlock(&opts->lock); \
+ \
+ return result; \
+} \
+ \
+static ssize_t f_midi_opts_##name##_store(struct config_item *item, \
+ const char *page, size_t len) \
+{ \
+ struct f_midi_opts *opts = to_f_midi_opts(item); \
+ int ret; \
+ char *c; \
+ \
+ mutex_lock(&opts->lock); \
+ if (opts->refcnt > 1) { \
+ ret = -EBUSY; \
+ goto end; \
+ } \
+ \
+ c = kstrndup(page, len, GFP_KERNEL); \
+ if (!c) { \
+ ret = -ENOMEM; \
+ goto end; \
+ } \
+ kfree(opts->name); \
+ opts->name = c; \
+ ret = len; \
+end: \
+ mutex_unlock(&opts->lock); \
+ return ret; \
+} \
+ \
+CONFIGFS_ATTR(f_midi_opts_, name)
+
F_MIDI_OPT_SIGNED(index, true, SNDRV_CARDS);
F_MIDI_OPT(buflen, false, 0);
F_MIDI_OPT(qlen, false, 0);
F_MIDI_OPT(in_ports, true, MAX_PORTS);
F_MIDI_OPT(out_ports, true, MAX_PORTS);
-
-static ssize_t f_midi_opts_id_show(struct config_item *item, char *page)
-{
- struct f_midi_opts *opts = to_f_midi_opts(item);
- ssize_t result;
-
- mutex_lock(&opts->lock);
- if (opts->id) {
- result = strscpy(page, opts->id, PAGE_SIZE);
- } else {
- page[0] = 0;
- result = 0;
- }
-
- mutex_unlock(&opts->lock);
-
- return result;
-}
-
-static ssize_t f_midi_opts_id_store(struct config_item *item,
- const char *page, size_t len)
-{
- struct f_midi_opts *opts = to_f_midi_opts(item);
- int ret;
- char *c;
-
- mutex_lock(&opts->lock);
- if (opts->refcnt > 1) {
- ret = -EBUSY;
- goto end;
- }
-
- c = kstrndup(page, len, GFP_KERNEL);
- if (!c) {
- ret = -ENOMEM;
- goto end;
- }
- if (opts->id_allocated)
- kfree(opts->id);
- opts->id = c;
- opts->id_allocated = true;
- ret = len;
-end:
- mutex_unlock(&opts->lock);
- return ret;
-}
-
-CONFIGFS_ATTR(f_midi_opts_, id);
+F_MIDI_OPT_STRING(id);
+F_MIDI_OPT_STRING(interface_string);
static struct configfs_attribute *midi_attrs[] = {
&f_midi_opts_attr_index,
@@ -1239,6 +1245,7 @@ static struct configfs_attribute *midi_attrs[] = {
&f_midi_opts_attr_in_ports,
&f_midi_opts_attr_out_ports,
&f_midi_opts_attr_id,
+ &f_midi_opts_attr_interface_string,
NULL,
};
@@ -1262,8 +1269,8 @@ static void f_midi_free_inst(struct usb_function_instance *f)
mutex_unlock(&opts->lock);
if (free) {
- if (opts->id_allocated)
- kfree(opts->id);
+ kfree(opts->id);
+ kfree(opts->interface_string);
kfree(opts);
}
}
@@ -1279,7 +1286,8 @@ static struct usb_function_instance *f_midi_alloc_inst(void)
mutex_init(&opts->lock);
opts->func_inst.free_func_inst = f_midi_free_inst;
opts->index = SNDRV_DEFAULT_IDX1;
- opts->id = SNDRV_DEFAULT_STR1;
+ opts->id = NULL;
+ opts->interface_string = NULL;
opts->buflen = 512;
opts->qlen = 32;
opts->in_ports = 1;
diff --git a/drivers/usb/gadget/function/u_midi.h b/drivers/usb/gadget/function/u_midi.h
index 2e400b495cb8..41cb8aa73f09 100644
--- a/drivers/usb/gadget/function/u_midi.h
+++ b/drivers/usb/gadget/function/u_midi.h
@@ -19,7 +19,7 @@ struct f_midi_opts {
struct usb_function_instance func_inst;
int index;
char *id;
- bool id_allocated;
+ char *interface_string;
unsigned int in_ports;
unsigned int out_ports;
unsigned int buflen;
base-commit: 67a454e6b1c604555c04501c77b7fedc5d98a779
--
2.43.0
^ permalink raw reply related
* Re: Maintainers / Kernel Summit 2021 planning kick-off
From: Abdullah Alamri @ 2025-12-03 1:15 UTC (permalink / raw)
To: torvalds
Cc: James.Bottomley, cl, david, greg, jikos, ksummit, linux-api,
linux-arch, linux-block, linux-fsdevel, linux-kernel, linux-mm,
lkml, netdev, tytso
أُرسلت من الـ iPhone
^ permalink raw reply
* Re: [PATCH] sched/deadline: Add reporting of runtime left & abs deadline to sched_getattr() for DEADLINE tasks
From: Matteo Martelli @ 2025-12-01 15:09 UTC (permalink / raw)
To: Tommaso Cucinotta
Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
linux-kernel, linux-api, Tommaso Cucinotta, Tommaso Cucinotta,
Peter Zijlstra
In-Reply-To: <20250912053937.31636-2-tommaso.cucinotta@santannapisa.it>
Hi Tommaso,
On Fri, 12 Sep 2025 07:38:29 +0200, Tommaso Cucinotta <tommaso.cucinotta@gmail.com> wrote:
> The SCHED_DEADLINE scheduler allows reading the statically configured
> run-time, deadline, and period parameters through the sched_getattr()
> system call. However, there is no immediate way to access, from user space,
> the current parameters used within the scheduler: the instantaneous runtime
> left in the current cycle, as well as the current absolute deadline.
>
> The `flags' sched_getattr() parameter, so far mandated to contain zero,
> now supports the SCHED_GETATTR_FLAG_DL_DYNAMIC=1 flag, to request
> retrieval of the leftover runtime and absolute deadline, converted to a
> CLOCK_MONOTONIC reference, instead of the statically configured parameters.
>
> This feature is useful for adaptive SCHED_DEADLINE tasks that need to
> modify their behavior depending on whether or not there is enough runtime
> left in the current period, and/or what is the current absolute deadline.
>
> Notes:
> - before returning the instantaneous parameters, the runtime is updated;
> - the abs deadline is returned shifted from rq_clock() to ktime_get_ns(),
> in CLOCK_MONOTONIC reference; this causes multiple invocations from the
> same period to return values that may differ for a few ns (showing some
> small drift), albeit the deadline doesn't move, in rq_clock() reference;
> - the abs deadline value returned to user-space, as unsigned 64-bit value,
> can represent nearly 585 years since boot time;
> - setting flags=0 provides the old behavior (retrieve static parameters).
>
> See also the notes from discussion held at OSPM 2025 on the topic
> "Making user space aware of current deadline-scheduler parameters".
>
> Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
> ...
I tested your patch and I can confirm I could retrieve the remaining runtime and
absolute deadline values via sched_gettattr() as you mentioned in your cover
letter. I'm involved in a project where the deadline scheduler is used in
critical realtime applications, which are monitored in order to let the system
react in case of misbehaviours, for instance if the applications overrun their
WCET or miss their expected deadlines.
On top of your reasons explained at OSPM 2025, I think this patch could be also
useful in regards of monitoring as it would let userspace retrieve a more
accurate runtime value compared to the typical estimation calculated via the
clock_gettime(CLOCK_THREAD_CPUTIME_ID). To my understanding, the runtime_left
retrieved with sched_gettattr() is also more representative of the actual task
remaining budget when the runtime is scaled in dl_scaled_delta_exec() to
calculate the time invariant task utilization [1] for energy-aware scheduling
[2]. Moreover, this patch would allow the application to determine deadline
misses.
Tested-by: Matteo Martelli <matteo.martelli@codethink.co.uk>
Best regards,
Matteo Martelli
[1]: https://www.kernel.org/doc/html/v6.17/scheduler/sched-capacity.html#frequency-invariance
[2]: https://www.kernel.org/doc/html/v6.17/scheduler/sched-deadline.html#energy-aware-scheduling
^ permalink raw reply
* Re: [PATCH 0/2] man7/ip.7: Clarify PKTINFO's docs
From: Alejandro Colomar @ 2025-11-26 21:06 UTC (permalink / raw)
To: dzwdz; +Cc: linux-man, LKML, Linux API, ej
In-Reply-To: <7zsiwtuwz3ybq6cymwkfbp2fxiliof25ko2zk77cgitfsxxgc6@x5jbr76h3f4s>
[-- Attachment #1: Type: text/plain, Size: 1727 bytes --]
Hi Jakub,
On Tue, Nov 25, 2025 at 01:13:05PM +0100, Alejandro Colomar wrote:
> Hi dzwdz,
>
> On Tue, Nov 25, 2025 at 02:50:19AM +0100, dzwdz wrote:
> > On 11/18/25 14:51, Alejandro Colomar wrote:
> > > Do you suggest moving each socket option to a manual page under
> > > man2const/? I think I agree with that. There's precedent, and it makes
> > > the pages more readable.
> >
> > In general - yes, definitely!
>
> Done. I've split ip(7) into a large amount of small pages this weekend.
> Please have a look at them and suggest any improvements you consider
> appropriate. ;)
>
> > However, struct in_pktinfo can be passed to sendmsg even if IP_PKTINFO isn't
> > set, so I don't think it would make sense to document it in e.g.
> > IP_PKTINFO(2const) - it should probably get its own manpage in man2type.
> > That option, in turn, only makes sense in the context of that struct, so I
> > think it should probably be documented in in_pktinfo(2type).
> >
> > This would /kinda/ be like how e.g. PA_INT(3const) points to
> > printf.h(3head), I guess?
> >
> > I'd be happy to try writing that manpage if you think this approach makes
> > sense :)
>
> Yup, it makes sense. :)
>
> I'll simplify your work by doing some initial changes. Please wait
> a couple of days before starting, so I can finish doing that.
Done. Please fetch the latest changes, and do what you consider
appropriate. :)
Have a lovely night!
Alex
>
>
> Have a lovely day!
> Alex
>
> >
> > Thanks,
> > dzwdz
>
>
>
>
> --
> <https://www.alejandro-colomar.es>
> Use port 80 (that is, <...:80/>).
--
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v8 00/18] Live Update Orchestrator
From: Pasha Tatashin @ 2025-11-25 19:01 UTC (permalink / raw)
To: David Matlack
Cc: pratyush, jasonmiu, graf, rppt, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aSX19cWypvh1mKWM@google.com>
On Tue, Nov 25, 2025 at 1:31 PM David Matlack <dmatlack@google.com> wrote:
>
> On 2025-11-25 11:58 AM, Pasha Tatashin wrote:
> >
> > Pasha Tatashin (12):
> > liveupdate: luo_core: Live Update Orchestrato,
> > liveupdate: luo_core: integrate with KHO
> > kexec: call liveupdate_reboot() before kexec
> > liveupdate: luo_session: add sessions support
> > liveupdate: luo_core: add user interface
> > liveupdate: luo_file: implement file systems callbacks
> > liveupdate: luo_session: Add ioctls for file preservation
> > docs: add luo documentation
> > MAINTAINERS: add liveupdate entry
> > selftests/liveupdate: Add userspace API selftests
> > selftests/liveupdate: Add simple kexec-based selftest for LUO
> > selftests/liveupdate: Add kexec test for multiple and empty sessions
> >
> > Pratyush Yadav (6):
> > mm: shmem: use SHMEM_F_* flags instead of VM_* flags
> > mm: shmem: allow freezing inode mapping
> > mm: shmem: export some functions to internal.h
> > liveupdate: luo_file: add private argument to store runtime state
> > mm: memfd_luo: allow preserving memfd
> > docs: add documentation for memfd preservation via LUO
>
> I ran all the new selftests, including those that require kexec on an
> Intel EMR server, and all tests passed.
>
> Tested-by: David Matlack <dmatlack@google.com>
Great, thank you David!
Pasha
^ permalink raw reply
* Re: [PATCH v8 02/18] liveupdate: luo_core: integrate with KHO
From: Pasha Tatashin @ 2025-11-25 19:01 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aSX7Nm_yrXHeejQU@kernel.org>
On Tue, Nov 25, 2025 at 1:54 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Tue, Nov 25, 2025 at 11:58:32AM -0500, Pasha Tatashin wrote:
> > Integrate the LUO with the KHO framework to enable passing LUO state
> > across a kexec reboot.
> >
> > This patch implements the lifecycle integration with KHO:
> >
> > 1. Incoming State: During early boot (`early_initcall`), LUO checks if
> > KHO is active. If so, it retrieves the "LUO" subtree, verifies the
> > "luo-v1" compatibility string, and reads the `liveupdate-number` to
> > track the update count.
> >
> > 2. Outgoing State: During late initialization (`late_initcall`), LUO
> > allocates a new FDT for the next kernel, populates it with the basic
> > header (compatible string and incremented update number), and
> > registers it with KHO (`kho_add_subtree`).
> >
> > 3. Finalization: The `liveupdate_reboot()` notifier is updated to invoke
> > `kho_finalize()`. This ensures that all memory segments marked for
> > preservation are properly serialized before the kexec jump.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Thank you!
Pasha
^ permalink raw reply
* Re: [PATCH v8 01/18] liveupdate: luo_core: Live Update Orchestrato,
From: Pasha Tatashin @ 2025-11-25 18:54 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aSX6sQqwwA6I2mxW@kernel.org>
On Tue, Nov 25, 2025 at 1:51 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Tue, Nov 25, 2025 at 11:58:31AM -0500, Pasha Tatashin wrote:
> > Subject: [PATCH v8 01/18] liveupdate: luo_core: Live Update Orchestrato,
>
> ^ Orchestrator
I like the sound of 'Orchestrato' :-)))))
Thanks,
Pasha
>
> > Introduce LUO, a mechanism intended to facilitate kernel updates while
> > keeping designated devices operational across the transition (e.g., via
> > kexec). The primary use case is updating hypervisors with minimal
> > disruption to running virtual machines. For userspace side of hypervisor
> > update we have copyless migration. LUO is for updating the kernel.
>
> --
> Sincerely yours,
> Mike.
^ permalink raw reply
* Re: [PATCH v8 02/18] liveupdate: luo_core: integrate with KHO
From: Mike Rapoport @ 2025-11-25 18:53 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251125165850.3389713-3-pasha.tatashin@soleen.com>
On Tue, Nov 25, 2025 at 11:58:32AM -0500, Pasha Tatashin wrote:
> Integrate the LUO with the KHO framework to enable passing LUO state
> across a kexec reboot.
>
> This patch implements the lifecycle integration with KHO:
>
> 1. Incoming State: During early boot (`early_initcall`), LUO checks if
> KHO is active. If so, it retrieves the "LUO" subtree, verifies the
> "luo-v1" compatibility string, and reads the `liveupdate-number` to
> track the update count.
>
> 2. Outgoing State: During late initialization (`late_initcall`), LUO
> allocates a new FDT for the next kernel, populates it with the basic
> header (compatible string and incremented update number), and
> registers it with KHO (`kho_add_subtree`).
>
> 3. Finalization: The `liveupdate_reboot()` notifier is updated to invoke
> `kho_finalize()`. This ensures that all memory segments marked for
> preservation are properly serialized before the kexec jump.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> include/linux/kho/abi/luo.h | 58 ++++++++++++
> kernel/liveupdate/luo_core.c | 154 ++++++++++++++++++++++++++++++-
> kernel/liveupdate/luo_internal.h | 22 +++++
> 3 files changed, 233 insertions(+), 1 deletion(-)
> create mode 100644 include/linux/kho/abi/luo.h
> create mode 100644 kernel/liveupdate/luo_internal.h
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH v8 01/18] liveupdate: luo_core: Live Update Orchestrato,
From: Mike Rapoport @ 2025-11-25 18:51 UTC (permalink / raw)
To: Pasha Tatashin
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <20251125165850.3389713-2-pasha.tatashin@soleen.com>
On Tue, Nov 25, 2025 at 11:58:31AM -0500, Pasha Tatashin wrote:
> Subject: [PATCH v8 01/18] liveupdate: luo_core: Live Update Orchestrato,
^ Orchestrator
> Introduce LUO, a mechanism intended to facilitate kernel updates while
> keeping designated devices operational across the transition (e.g., via
> kexec). The primary use case is updating hypervisors with minimal
> disruption to running virtual machines. For userspace side of hypervisor
> update we have copyless migration. LUO is for updating the kernel.
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH v7 19/22] selftests/liveupdate: add test infrastructure and scripts
From: Pasha Tatashin @ 2025-11-25 18:42 UTC (permalink / raw)
To: Mike Rapoport
Cc: pratyush, jasonmiu, graf, dmatlack, rientjes, corbet, rdunlap,
ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm, tj,
yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, linux,
linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, ptyadav,
lennart, brauner, linux-api, linux-fsdevel, saeedm, ajayachandra,
jgg, parav, leonro, witu, hughd, skhawaja, chrisl
In-Reply-To: <aSQPNuFIv0rRr2tp@kernel.org>
On Mon, Nov 24, 2025 at 2:54 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Sat, Nov 22, 2025 at 05:23:46PM -0500, Pasha Tatashin wrote:
> > Subject: [PATCH v7 19/22] selftests/liveupdate: add test infrastructure and scripts
>
> Maybe ^ end to end
Done.
>
> > Add the testing infrastructure required to verify the liveupdate
> > feature. This includes a custom init process, a test orchestration
> > script, and a batch runner.
>
> And say here that it's end to end test.
Done
> > +static int is_stage_2(void)
> > +{
> > + char cmdline[COMMAND_LINE_SIZE];
> > + ssize_t len;
> > + int fd;
> > +
> > + fd = open("/proc/cmdline", O_RDONLY);
> > + if (fd < 0)
> > + return 0;
> > +
> > + len = read(fd, cmdline, sizeof(cmdline) - 1);
> > + close(fd);
> > +
> > + if (len < 0)
> > + return 0;
>
> Shouldn't we bail out of the test if read of command line failed?
Sure, done.
> > +function cleanup() {
> > + local exit_code=$?
> > +
> > + if [ -z "$workspace_dir" ]; then
> > + ktap_finished
> > + return
> > + fi
> > +
> > + if [ $exit_code -ne 0 ]; then
> > + echo "# Test failed (exit code $exit_code)."
> > + echo "# Workspace preserved at: $workspace_dir"
> > + elif [ "$KEEP_WORKSPACE" -eq 1 ]; then
> > + echo "# Workspace preserved (user request) at: $workspace_dir"
> > + else
> > + rm -fr "$workspace_dir"
> > + fi
> > + ktap_finished
>
> exit $exit_code
Done
> > +function build_kernel() {
> > + local build_dir=$1
> > + local make_cmd=$2
> > + local kimage=$3
> > + local target_arch=$4
> > +
> > + local kconfig="$build_dir/.config"
> > + local common_conf="$test_dir/config"
> > + local arch_conf="$test_dir/config.$target_arch"
> > +
> > + echo "# Building kernel in: $build_dir"
> > + $make_cmd defconfig
> > +
> > + local fragments=""
> > + if [[ -f "$common_conf" ]]; then
> > + fragments="$fragments $common_conf"
> > + fi
>
> Without this CONFIG_LIVEUPDATE won't be set
> > +
> > + if [[ -f "$arch_conf" ]]; then
> > + fragments="$fragments $arch_conf"
> > + fi
> > +
> > + if [[ -n "$fragments" ]]; then
> > + "$kernel_dir/scripts/kconfig/merge_config.sh" \
> > + -Q -m -O "$build_dir" "$kconfig" $fragments >> /dev/null
> > + fi
>
> I believe you can just
>
> cat $common_conf $fragments > $build_dir/.config
> make olddefconfig
>
> without running defconfig at the beginning
> It will build faster, just make sure to add CONFIG_SERIAL_ to $arch_conf
I will look into that, so how performance really changes, I liked
using merge_config.sh as it does not print warnings.
>
> > + $make_cmd olddefconfig
> > + $make_cmd "$kimage"
> > + $make_cmd headers_install INSTALL_HDR_PATH="$headers_dir"
> > +}
> > +
> > +function mkinitrd() {
> > + local build_dir=$1
> > + local kernel_path=$2
> > + local test_name=$3
> > +
> > + # 1. Compile the test binary and the init process
>
> Didn't find 2. ;-)
> Don't think we want the numbering here, plain comments are fine
Updated comment.
>
> > + "$CROSS_COMPILE"gcc -static -O2 \
> > + -I "$headers_dir/include" \
> > + -I "$test_dir" \
> > + -o "$workspace_dir/test_binary" \
> > + "$test_dir/$test_name.c" "$test_dir/luo_test_utils.c"
>
> This will have hard time cross-compiling with -nolibc toolchains
Hm, it works for me, I am not sure with nolibc cross compiler, am I
missing something?
>
> > +
> > + "$CROSS_COMPILE"gcc -s -static -Os -nostdinc -nostdlib \
> > + -fno-asynchronous-unwind-tables -fno-ident \
> > + -fno-stack-protector \
> > + -I "$headers_dir/include" \
> > + -I "$kernel_dir/tools/include/nolibc" \
> > + -o "$workspace_dir/init" "$test_dir/init.c"
>
> This failed for me with gcc 14.2.0 (Debian 14.2.0-19):
Updated, removed the extra const, and static.
>
> /home/mike/git/linux/tools/testing/selftests/liveupdate/init.c: In function ‘run_test’:
> /home/mike/git/linux/tools/testing/selftests/liveupdate/init.c:111:65: error: initializer element is not constant
> 111 | static const char *const argv[] = {TEST_BINARY, stage_arg, NULL};
> | ^~~~~~~~~
>
> /home/mike/git/linux/tools/testing/selftests/liveupdate/init.c:111:65: note: (near initialization for ‘argv[1]’)
> /home/mike/git/linux/tools/testing/selftests/liveupdate/init.c:113:37: error: passing argument 2 of ‘execve’ from incompatible pointer type [-Wincompatible-pointer-types]
> 113 | execve(TEST_BINARY, argv, NULL);
> | ^~~~
> | |
> | const char * const*
> In file included from /home/mike/git/linux/tools/testing/selftests/liveupdate/init.c:16:
> /usr/include/unistd.h:572:52: note: expected ‘char * const*’ but argument is of type ‘const char * const*’
> 572 | extern int execve (const char *__path, char *const __argv[],
> | ~~~~~~~~~~~~^~~~~~~~
>
> > +
> > + cat > "$workspace_dir/cpio_list_inner" <<EOF
> > +dir /dev 0755 0 0
> > +dir /proc 0755 0 0
> > +dir /debugfs 0755 0 0
> > +nod /dev/console 0600 0 0 c 5 1
>
> Don't you need /dev/liveupdate node?
That should be created by the kernel itself.
>
> > +file /init $workspace_dir/init 0755 0 0
> > +file /test_binary $workspace_dir/test_binary 0755 0 0
> > +EOF
> > +
> > + # Generate inner_initrd.cpio
> > + "$build_dir/usr/gen_init_cpio" "$workspace_dir/cpio_list_inner" > "$workspace_dir/inner_initrd.cpio"
> > +
> > + cat > "$workspace_dir/cpio_list" <<EOF
> > +dir /dev 0755 0 0
> > +dir /proc 0755 0 0
> > +dir /debugfs 0755 0 0
> > +nod /dev/console 0600 0 0 c 5 1
>
> And here as well.
Not needed.
>
> > +file /init $workspace_dir/init 0755 0 0
> > +file /kernel $kernel_path 0644 0 0
> > +file /test_binary $workspace_dir/test_binary 0755 0 0
> > +file /initrd.img $workspace_dir/inner_initrd.cpio 0644 0 0
> > +EOF
> > +
> > + # Generate the final initrd
> > + "$build_dir/usr/gen_init_cpio" "$workspace_dir/cpio_list" > "$initrd"
> > + local size=$(du -h "$initrd" | cut -f1)
> > +}
> > +
> > +function run_qemu() {
> > + local qemu_cmd=$1
> > + local cmdline=$2
> > + local kernel_path=$3
> > + local serial="$workspace_dir/qemu.serial"
> > +
> > + local accel="-accel tcg"
> > + local host_machine=$(uname -m)
> > +
> > + [[ "$host_machine" == "arm64" ]] && host_machine="aarch64"
> > + [[ "$host_machine" == "x86_64" ]] && host_machine="x86_64"
> > +
> > + if [[ "$qemu_cmd" == *"$host_machine"* ]]; then
> > + if [ -w /dev/kvm ]; then
> > + accel="-accel kvm"
>
> Just pass both kvm and tcg and let qemu complain.
I hated those warnings, this is why I added this "if" in the first place :-)
Thank you for your reviews, I am going to send this patch separately
from this series, so let's continue the discussion there.
Pasha
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox