* [PATCH v2 0/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast
@ 2025-07-25 16:31 Johannes Nixdorf
2025-07-25 16:31 ` [PATCH v2 1/2] " Johannes Nixdorf
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Johannes Nixdorf @ 2025-07-25 16:31 UTC (permalink / raw)
To: Kees Cook, Andy Lutomirski, Will Drewry, Sargun Dhillon,
Shuah Khan
Cc: linux-kernel, Ali Polatel, linux-kselftest, bpf, Johannes Nixdorf
If WAIT_KILLABLE_RECV was specified, and an event is received, the
tracee's syscall is not supposed to be interruptible. This was not properly
ensured if the reply was sent too fast, and an interrupting signal was
received before the reply was processed on the tracee side.
This series fixes the bug and adds a test case for it to the selftests.
Signed-off-by: Johannes Nixdorf <johannes@nixdorf.dev>
---
Changes in v2:
- Added a selftest for the bug.
- Link to v1: https://lore.kernel.org/r/20250723-seccomp-races-v1-1-bef5667ce30a@nixdorf.dev
---
Johannes Nixdorf (2):
seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast
selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race
kernel/seccomp.c | 13 ++-
tools/testing/selftests/seccomp/seccomp_bpf.c | 130 ++++++++++++++++++++++++++
2 files changed, 136 insertions(+), 7 deletions(-)
---
base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41
change-id: 20250721-seccomp-races-e97897d6d94b
Best regards,
--
Johannes Nixdorf <johannes@nixdorf.dev>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast
2025-07-25 16:31 [PATCH v2 0/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast Johannes Nixdorf
@ 2025-07-25 16:31 ` Johannes Nixdorf
2025-07-25 16:31 ` [PATCH v2 2/2] selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race Johannes Nixdorf
2025-07-29 20:34 ` [PATCH v2 0/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast Kees Cook
2 siblings, 0 replies; 6+ messages in thread
From: Johannes Nixdorf @ 2025-07-25 16:31 UTC (permalink / raw)
To: Kees Cook, Andy Lutomirski, Will Drewry, Sargun Dhillon,
Shuah Khan
Cc: linux-kernel, Ali Polatel, linux-kselftest, bpf, Johannes Nixdorf
Normally the tracee starts in SECCOMP_NOTIFY_INIT, sends an
event to the tracer, and starts to wait interruptibly. With
SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV, if the tracer receives the
message (SECCOMP_NOTIFY_SENT is reached) while the tracee was waiting
and is subsequently interrupted, the tracee begins to wait again
uninterruptibly (but killable).
This fails if SECCOMP_NOTIFY_REPLIED is reached before the tracee
is interrupted, as the check only considered SECCOMP_NOTIFY_SENT as a
condition to begin waiting again. In this case the tracee is interrupted
even though the tracer already acted on its behalf. This breaks the
assumption SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV wanted to ensure,
namely that the tracer can be sure the syscall is not interrupted or
restarted on the tracee after it is received on the tracer. Fix this
by also considering SECCOMP_NOTIFY_REPLIED when evaluating whether to
switch to uninterruptible waiting.
With the condition changed the loop in seccomp_do_user_notification()
would exit immediately after deciding that noninterruptible waiting
is required if the operation already reached SECCOMP_NOTIFY_REPLIED,
skipping the code that processes pending addfd commands first. Prevent
this by executing the remaining loop body one last time in this case.
Fixes: c2aa2dfef243 ("seccomp: Add wait_killable semantic to seccomp user notifier")
Reported-by: Ali Polatel <alip@chesswob.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220291
Signed-off-by: Johannes Nixdorf <johannes@nixdorf.dev>
---
kernel/seccomp.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 41aa761c7738cefe01ca755f78f12844d7186e2a..fa44bcb6aa47df88bdc5951217d99779bd56ab70 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1139,7 +1139,7 @@ static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd, struct seccomp_kn
static bool should_sleep_killable(struct seccomp_filter *match,
struct seccomp_knotif *n)
{
- return match->wait_killable_recv && n->state == SECCOMP_NOTIFY_SENT;
+ return match->wait_killable_recv && n->state >= SECCOMP_NOTIFY_SENT;
}
static int seccomp_do_user_notification(int this_syscall,
@@ -1186,13 +1186,12 @@ static int seccomp_do_user_notification(int this_syscall,
if (err != 0) {
/*
- * Check to see if the notifcation got picked up and
- * whether we should switch to wait killable.
+ * Check to see whether we should switch to wait
+ * killable. Only return the interrupted error if not.
*/
- if (!wait_killable && should_sleep_killable(match, &n))
- continue;
-
- goto interrupted;
+ if (!(!wait_killable && should_sleep_killable(match,
+ &n)))
+ goto interrupted;
}
addfd = list_first_entry_or_null(&n.addfd,
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race
2025-07-25 16:31 [PATCH v2 0/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast Johannes Nixdorf
2025-07-25 16:31 ` [PATCH v2 1/2] " Johannes Nixdorf
@ 2025-07-25 16:31 ` Johannes Nixdorf
2025-07-29 1:07 ` Kees Cook
2025-07-29 20:34 ` [PATCH v2 0/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast Kees Cook
2 siblings, 1 reply; 6+ messages in thread
From: Johannes Nixdorf @ 2025-07-25 16:31 UTC (permalink / raw)
To: Kees Cook, Andy Lutomirski, Will Drewry, Sargun Dhillon,
Shuah Khan
Cc: linux-kernel, Ali Polatel, linux-kselftest, bpf, Johannes Nixdorf
If WAIT_KILLABLE_RECV was specified, and an event is received, the
tracee's syscall is not supposed to be interruptible. This was not properly
ensured if the reply was sent too fast, and an interrupting signal was
received before the reply was processed on the tracee side.
Add a test for this, that consists of:
- a tracee with a timer that keeps sending it signals while repeatedly
running a traced syscall in a loop,
- a tracer that repeatedly handles all syscalls from the tracee in a
loop, and
- a shared pipe between both, on which the tracee sends one byte per
syscall attempted and the tracer reads one byte per syscall handled.
If the syscall for the tracee is restarted after the tracer received the
event for it due to this bug, the tracee will not have sent a second
token on the pipe, which the tracer will notice and fail the test.
The tests also uses SECCOMP_IOCTL_NOTIF_ADDFD with SECCOMP_ADDFD_FLAG_SEND
for the reply, as the fix for the bug has an additional code path
change for handling addfd, which would not be exercised by a simple
SECCOMP_IOCTL_NOTIF_SEND, and it is possible to fix the bug while leaving
the same race intact for the addfd case.
This test is not guaranteed to reproduce the bug on every run, but the
parameters (signal frequency and number of repeated syscalls) have been
chosen so that on my machine this test:
- takes ~0.8s in the good case (+1s in the failure case), and
- detects the bug in 999 of 1000 runs.
Signed-off-by: Johannes Nixdorf <johannes@nixdorf.dev>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 130 ++++++++++++++++++++++++++
1 file changed, 130 insertions(+)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 61acbd45ffaaf87b180c8dff2324a02282356fcd..b24d0cbe88b4499a7635c6a075bfc6a660409792 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -3547,6 +3547,10 @@ static void signal_handler(int signal)
perror("write from signal");
}
+static void signal_handler_nop(int signal)
+{
+}
+
TEST(user_notification_signal)
{
pid_t pid;
@@ -4819,6 +4823,132 @@ TEST(user_notification_wait_killable_fatal)
EXPECT_EQ(SIGTERM, WTERMSIG(status));
}
+/* Ensure signals after the reply do not interrupt */
+TEST(user_notification_wait_killable_after_reply)
+{
+ int i, max_iter = 100000;
+ int listener, status;
+ int pipe_fds[2];
+ pid_t pid;
+ long ret;
+
+ ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+ ASSERT_EQ(0, ret)
+ {
+ TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+ }
+
+ listener = user_notif_syscall(
+ __NR_dup, SECCOMP_FILTER_FLAG_NEW_LISTENER |
+ SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV);
+ ASSERT_GE(listener, 0);
+
+ /*
+ * Used to count invocations. One token is transferred from the child
+ * to the parent per syscall invocation, the parent tries to take
+ * one token per successful RECV. If the syscall is restarted after
+ * RECV the parent will try to get two tokens while the child only
+ * provided one.
+ */
+ ASSERT_EQ(pipe(pipe_fds), 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ struct sigaction new_action = {
+ .sa_handler = signal_handler_nop,
+ .sa_flags = SA_RESTART,
+ };
+ struct itimerval timer = {
+ .it_value = { .tv_usec = 1000 },
+ .it_interval = { .tv_usec = 1000 },
+ };
+ char c = 'a';
+
+ close(pipe_fds[0]);
+
+ /* Setup the sigaction with SA_RESTART */
+ if (sigaction(SIGALRM, &new_action, NULL)) {
+ perror("sigaction");
+ exit(1);
+ }
+
+ /*
+ * Kill with SIGALRM repeatedly, to try to hit the race when
+ * handling the syscall.
+ */
+ if (setitimer(ITIMER_REAL, &timer, NULL) < 0)
+ perror("setitimer");
+
+ for (i = 0; i < max_iter; ++i) {
+ int fd;
+
+ /* Send one token per iteration to catch repeats. */
+ if (write(pipe_fds[1], &c, sizeof(c)) != 1) {
+ perror("write");
+ exit(1);
+ }
+
+ fd = syscall(__NR_dup, 0);
+ if (fd < 0) {
+ perror("dup");
+ exit(1);
+ }
+ close(fd);
+ }
+
+ exit(0);
+ }
+
+ close(pipe_fds[1]);
+
+ for (i = 0; i < max_iter; ++i) {
+ struct seccomp_notif req = {};
+ struct seccomp_notif_addfd addfd = {};
+ struct pollfd pfd = {
+ .fd = pipe_fds[0],
+ .events = POLLIN,
+ };
+ char c;
+
+ /*
+ * Try to receive one token. If it failed, one child syscall
+ * was restarted after RECV and needed to be handled twice.
+ */
+ ASSERT_EQ(poll(&pfd, 1, 1000), 1)
+ kill(pid, SIGKILL);
+
+ ASSERT_EQ(read(pipe_fds[0], &c, sizeof(c)), 1)
+ kill(pid, SIGKILL);
+
+ /*
+ * Get the notification, reply to it as fast as possible to test
+ * whether the child wrongly skips going into the non-preemptible
+ * (TASK_KILLABLE) state.
+ */
+ do
+ ret = ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req);
+ while (ret < 0 && errno == ENOENT); /* Accept interruptions before RECV */
+ ASSERT_EQ(ret, 0)
+ kill(pid, SIGKILL);
+
+ addfd.id = req.id;
+ addfd.flags = SECCOMP_ADDFD_FLAG_SEND;
+ addfd.srcfd = 0;
+ ASSERT_GE(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), 0)
+ kill(pid, SIGKILL);
+ }
+
+ /*
+ * Wait for the process to exit, and make sure the process terminated
+ * with a zero exit code..
+ */
+ EXPECT_EQ(waitpid(pid, &status, 0), pid);
+ EXPECT_EQ(true, WIFEXITED(status));
+ EXPECT_EQ(0, WEXITSTATUS(status));
+}
+
struct tsync_vs_thread_leader_args {
pthread_t leader;
};
--
2.50.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race
2025-07-25 16:31 ` [PATCH v2 2/2] selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race Johannes Nixdorf
@ 2025-07-29 1:07 ` Kees Cook
2025-07-29 16:45 ` Johannes Nixdorf
0 siblings, 1 reply; 6+ messages in thread
From: Kees Cook @ 2025-07-29 1:07 UTC (permalink / raw)
To: Johannes Nixdorf
Cc: Andy Lutomirski, Will Drewry, Sargun Dhillon, Shuah Khan,
linux-kernel, Ali Polatel, linux-kselftest, bpf
On Fri, Jul 25, 2025 at 06:31:19PM +0200, Johannes Nixdorf wrote:
> + struct itimerval timer = {
> + .it_value = { .tv_usec = 1000 },
> + .it_interval = { .tv_usec = 1000 },
> + };
To get this to build, I needed to add a sys/time.h include:
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index b24d0cbe88b4..fc4910d35342 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -24,6 +24,7 @@
#include <linux/filter.h>
#include <sys/prctl.h>
#include <sys/ptrace.h>
+#include <sys/time.h>
#include <sys/user.h>
#include <linux/prctl.h>
#include <linux/ptrace.h>
But, with that, yes, I can confirm the race and the fix. Thank you!
I can fix that up locally.
-Kees
--
Kees Cook
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race
2025-07-29 1:07 ` Kees Cook
@ 2025-07-29 16:45 ` Johannes Nixdorf
0 siblings, 0 replies; 6+ messages in thread
From: Johannes Nixdorf @ 2025-07-29 16:45 UTC (permalink / raw)
To: Kees Cook, Johannes Nixdorf
Cc: Andy Lutomirski, Will Drewry, Sargun Dhillon, Shuah Khan,
linux-kernel, Ali Polatel, linux-kselftest, bpf
On Tue Jul 29, 2025 at 3:07 AM CEST, Kees Cook wrote:
> On Fri, Jul 25, 2025 at 06:31:19PM +0200, Johannes Nixdorf wrote:
>> + struct itimerval timer = {
>> + .it_value = { .tv_usec = 1000 },
>> + .it_interval = { .tv_usec = 1000 },
>> + };
>
> To get this to build, I needed to add a sys/time.h include:
>
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index b24d0cbe88b4..fc4910d35342 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -24,6 +24,7 @@
> #include <linux/filter.h>
> #include <sys/prctl.h>
> #include <sys/ptrace.h>
> +#include <sys/time.h>
> #include <sys/user.h>
> #include <linux/prctl.h>
> #include <linux/ptrace.h>
>
> But, with that, yes, I can confirm the race and the fix. Thank you!
> I can fix that up locally.
Sounds good. The change looks correct to me as well.
>
> -Kees
Best regards,
Johannes
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 0/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast
2025-07-25 16:31 [PATCH v2 0/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast Johannes Nixdorf
2025-07-25 16:31 ` [PATCH v2 1/2] " Johannes Nixdorf
2025-07-25 16:31 ` [PATCH v2 2/2] selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race Johannes Nixdorf
@ 2025-07-29 20:34 ` Kees Cook
2 siblings, 0 replies; 6+ messages in thread
From: Kees Cook @ 2025-07-29 20:34 UTC (permalink / raw)
To: Andy Lutomirski, Will Drewry, Sargun Dhillon, Shuah Khan,
Johannes Nixdorf
Cc: Kees Cook, linux-kernel, Ali Polatel, linux-kselftest, bpf
On Fri, 25 Jul 2025 18:31:17 +0200, Johannes Nixdorf wrote:
> If WAIT_KILLABLE_RECV was specified, and an event is received, the
> tracee's syscall is not supposed to be interruptible. This was not properly
> ensured if the reply was sent too fast, and an interrupting signal was
> received before the reply was processed on the tracee side.
>
> This series fixes the bug and adds a test case for it to the selftests.
>
> [...]
With minor edits, applied to for-next/seccomp, thanks!
[1/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast
https://git.kernel.org/kees/c/cce436aafc2a
[2/2] selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race
https://git.kernel.org/kees/c/b0c9bfbab925
Take care,
--
Kees Cook
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-07-29 20:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-25 16:31 [PATCH v2 0/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast Johannes Nixdorf
2025-07-25 16:31 ` [PATCH v2 1/2] " Johannes Nixdorf
2025-07-25 16:31 ` [PATCH v2 2/2] selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race Johannes Nixdorf
2025-07-29 1:07 ` Kees Cook
2025-07-29 16:45 ` Johannes Nixdorf
2025-07-29 20:34 ` [PATCH v2 0/2] seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast Kees Cook
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).