* [RFC PATCH 0/2] seccomp: drop syscall exit events for rejected syscalls
@ 2026-04-19 15:52 Oleg Nesterov
2026-04-19 15:53 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
2026-04-19 15:53 ` [RFC PATCH 2/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
0 siblings, 2 replies; 4+ messages in thread
From: Oleg Nesterov @ 2026-04-19 15:52 UTC (permalink / raw)
To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
Will Drewry
Cc: Eric Paris, Kusaram Devineni, Max Ver, Paul Moore, audit,
linux-kernel
On top of [tip: core/urgent] entry: Kill ARCH_SYSCALL_WORK_{ENTER,EXIT}
https://git.kernel.org/tip/7b41ff29c8d386257bae62ad557fd6bad8cc6787
Still RFC, please comment...
Of course this is a user-visible behavior change. ptrace and audit
will no longer see the paired exit events for syscalls rejected with
force_sig_seccomp(). I _hope_ this is fine and better than what we
have now. If this is not acceptable, we can return to "seccomp: defer
syscall_rollback() to get_signal()" we discussed before.
2/2 currently ignores !CONFIG_GENERIC_ENTRY architectures. If this
approach is accepted, it will be simple to update them one-by-one
to sync with the CONFIG_GENERIC_ENTRY case.
Oleg.
---
include/linux/entry-common.h | 9 ++++++++-
include/linux/thread_info.h | 2 ++
kernel/seccomp.c | 22 ++++++++++++++--------
3 files changed, 24 insertions(+), 9 deletions(-)
^ permalink raw reply [flat|nested] 4+ messages in thread* [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper
2026-04-19 15:52 [RFC PATCH 0/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
@ 2026-04-19 15:53 ` Oleg Nesterov
2026-04-19 15:53 ` [RFC PATCH 2/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
1 sibling, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2026-04-19 15:53 UTC (permalink / raw)
To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
Will Drewry
Cc: Eric Paris, Kusaram Devineni, Max Ver, Paul Moore, audit,
linux-kernel
To factor out the code and simplify the next change
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/seccomp.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 066909393c38..cb8dd78791cd 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1256,6 +1256,14 @@ static int seccomp_do_user_notification(int this_syscall,
return -1;
}
+static void seccomp_nack_syscall(int this_syscall, int data, bool force_coredump)
+{
+ /* Show the handler or coredump the original registers. */
+ syscall_rollback(current, current_pt_regs());
+ /* Let the filter pass back 16 bits of data. */
+ force_sig_seccomp(this_syscall, data, force_coredump);
+}
+
static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
{
u32 filter_ret, action;
@@ -1285,10 +1293,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
goto skip;
case SECCOMP_RET_TRAP:
- /* Show the handler the original registers. */
- syscall_rollback(current, current_pt_regs());
- /* Let the filter pass back 16 bits of data. */
- force_sig_seccomp(this_syscall, data, false);
+ seccomp_nack_syscall(this_syscall, data, false);
goto skip;
case SECCOMP_RET_TRACE:
@@ -1360,10 +1365,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
/* Dump core only if this is the last remaining thread. */
if (action != SECCOMP_RET_KILL_THREAD ||
(atomic_read(¤t->signal->live) == 1)) {
- /* Show the original registers in the dump. */
- syscall_rollback(current, current_pt_regs());
- /* Trigger a coredump with SIGSYS */
- force_sig_seccomp(this_syscall, data, true);
+ seccomp_nack_syscall(this_syscall, data, true);
} else {
do_exit(SIGSYS);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* [RFC PATCH 2/2] seccomp: drop syscall exit events for rejected syscalls
2026-04-19 15:52 [RFC PATCH 0/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
2026-04-19 15:53 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
@ 2026-04-19 15:53 ` Oleg Nesterov
1 sibling, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2026-04-19 15:53 UTC (permalink / raw)
To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
Will Drewry
Cc: Eric Paris, Kusaram Devineni, Max Ver, Paul Moore, audit,
linux-kernel
seccomp_nack_syscall() calls syscall_rollback(), which means that the
syscall exit path sees the original syscall number as the return value.
This confuses audit_syscall_exit(), trace_syscall_exit(), and ptrace,
causing them to report completely bogus syscall exit events.
Add a new SYSCALL_WORK_SECCOMP_EXIT flag set by seccomp_nack_syscall(),
and change syscall_exit_work() to return early if this flag is set. After
all, this syscall was never actually executed.
Note that syscall_exit_work() has to clear SYSCALL_WORK_SECCOMP_EXIT for
the !force_coredump case, and that is why we actually need the new flag:
seccomp_nack_syscall() can't just clear SYSCALL_AUDIT/TRACEPOINT/TRACE.
Reported-by: Max Ver <dudududumaxver@gmail.com>
Closes: https://lore.kernel.org/all/CABjJbFJO+p3jA1r0gjUZrCepQb1Fab3kqxYhc_PSfoqo21ypeQ@mail.gmail.com/
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/linux/entry-common.h | 9 ++++++++-
include/linux/thread_info.h | 2 ++
kernel/seccomp.c | 4 ++++
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 535da46c3ee9..403802eed387 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -34,7 +34,8 @@
SYSCALL_WORK_SYSCALL_TRACE | \
SYSCALL_WORK_SYSCALL_AUDIT | \
SYSCALL_WORK_SYSCALL_USER_DISPATCH | \
- SYSCALL_WORK_SYSCALL_EXIT_TRAP)
+ SYSCALL_WORK_SYSCALL_EXIT_TRAP | \
+ SYSCALL_WORK_SECCOMP_EXIT)
/**
* arch_ptrace_report_syscall_entry - Architecture specific ptrace_report_syscall_entry() wrapper
@@ -235,6 +236,12 @@ static __always_inline void syscall_exit_work(struct pt_regs *regs, unsigned lon
}
}
+ if (work & SYSCALL_WORK_SECCOMP_EXIT) {
+ /* Rejected by seccomp, no valid syscall exit state */
+ clear_syscall_work(SECCOMP_EXIT);
+ return;
+ }
+
audit_syscall_exit(regs);
if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 051e42902690..167c850ae16e 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -40,6 +40,7 @@ enum {
#ifdef CONFIG_GENERIC_ENTRY
enum syscall_work_bit {
SYSCALL_WORK_BIT_SECCOMP,
+ SYSCALL_WORK_BIT_SECCOMP_EXIT,
SYSCALL_WORK_BIT_SYSCALL_TRACEPOINT,
SYSCALL_WORK_BIT_SYSCALL_TRACE,
SYSCALL_WORK_BIT_SYSCALL_EMU,
@@ -50,6 +51,7 @@ enum syscall_work_bit {
};
#define SYSCALL_WORK_SECCOMP BIT(SYSCALL_WORK_BIT_SECCOMP)
+#define SYSCALL_WORK_SECCOMP_EXIT BIT(SYSCALL_WORK_BIT_SECCOMP_EXIT)
#define SYSCALL_WORK_SYSCALL_TRACEPOINT BIT(SYSCALL_WORK_BIT_SYSCALL_TRACEPOINT)
#define SYSCALL_WORK_SYSCALL_TRACE BIT(SYSCALL_WORK_BIT_SYSCALL_TRACE)
#define SYSCALL_WORK_SYSCALL_EMU BIT(SYSCALL_WORK_BIT_SYSCALL_EMU)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index cb8dd78791cd..35703dceb6d2 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1262,6 +1262,10 @@ static void seccomp_nack_syscall(int this_syscall, int data, bool force_coredump
syscall_rollback(current, current_pt_regs());
/* Let the filter pass back 16 bits of data. */
force_sig_seccomp(this_syscall, data, force_coredump);
+#ifdef CONFIG_GENERIC_ENTRY
+ /* No valid syscall exit state after syscall_rollback() */
+ set_syscall_work(SECCOMP_EXIT);
+#endif
}
static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal()
@ 2026-04-14 16:47 Oleg Nesterov
2026-04-14 16:48 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
0 siblings, 1 reply; 4+ messages in thread
From: Oleg Nesterov @ 2026-04-14 16:47 UTC (permalink / raw)
To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
Will Drewry
Cc: Kusaram Devineni, Max Ver, linux-kernel
Kees, Andy, et al, please comment. I think the usage of syscall_rollback()
in __seccomp_filter() is not right.
This is just RFC.
In fact I think that syscall_exit_work() should do nothing if a
syscall was rejected with force_sig_seccomp() by __seccomp_filter().
If nothing else, the syscall was never actually executed.
Perhaps we can add a new SYSCALL_WORK_SYSCALL_XXX to SYSCALL_WORK_EXIT.
seccomp_nack_syscall() can set this flag, and syscall_exit_work() can do
if (work & SYSCALL_WORK_SYSCALL_XXX) {
clear_syscall_work(SYSCALL_XXX); // for the !force_coredump case
return;
}
after the "if (SYSCALL_WORK_SYSCALL_USER_DISPATCH)" block.
But I didn't dare to do such a change.
What do you think?
Oleg.
^ permalink raw reply [flat|nested] 4+ messages in thread* [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper
2026-04-14 16:47 [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
@ 2026-04-14 16:48 ` Oleg Nesterov
0 siblings, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2026-04-14 16:48 UTC (permalink / raw)
To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
Will Drewry
Cc: Kusaram Devineni, Max Ver, linux-kernel
To factor out the code and simplify the next change
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/seccomp.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 066909393c38..cb8dd78791cd 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1256,6 +1256,14 @@ static int seccomp_do_user_notification(int this_syscall,
return -1;
}
+static void seccomp_nack_syscall(int this_syscall, int data, bool force_coredump)
+{
+ /* Show the handler or coredump the original registers. */
+ syscall_rollback(current, current_pt_regs());
+ /* Let the filter pass back 16 bits of data. */
+ force_sig_seccomp(this_syscall, data, force_coredump);
+}
+
static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
{
u32 filter_ret, action;
@@ -1285,10 +1293,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
goto skip;
case SECCOMP_RET_TRAP:
- /* Show the handler the original registers. */
- syscall_rollback(current, current_pt_regs());
- /* Let the filter pass back 16 bits of data. */
- force_sig_seccomp(this_syscall, data, false);
+ seccomp_nack_syscall(this_syscall, data, false);
goto skip;
case SECCOMP_RET_TRACE:
@@ -1360,10 +1365,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
/* Dump core only if this is the last remaining thread. */
if (action != SECCOMP_RET_KILL_THREAD ||
(atomic_read(¤t->signal->live) == 1)) {
- /* Show the original registers in the dump. */
- syscall_rollback(current, current_pt_regs());
- /* Trigger a coredump with SIGSYS */
- force_sig_seccomp(this_syscall, data, true);
+ seccomp_nack_syscall(this_syscall, data, true);
} else {
do_exit(SIGSYS);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-19 15:53 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-19 15:52 [RFC PATCH 0/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
2026-04-19 15:53 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
2026-04-19 15:53 ` [RFC PATCH 2/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
-- strict thread matches above, loose matches on Subject: below --
2026-04-14 16:47 [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
2026-04-14 16:48 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox