public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper
  2026-04-14 16:47 [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
@ 2026-04-14 16:48 ` Oleg Nesterov
  0 siblings, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2026-04-14 16:48 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Will Drewry
  Cc: Kusaram Devineni, Max Ver, linux-kernel

To factor out the code and simplify the next change

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/seccomp.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 066909393c38..cb8dd78791cd 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1256,6 +1256,14 @@ static int seccomp_do_user_notification(int this_syscall,
 	return -1;
 }
 
+static void seccomp_nack_syscall(int this_syscall, int data, bool force_coredump)
+{
+	/* Show the handler or coredump the original registers. */
+	syscall_rollback(current, current_pt_regs());
+	/* Let the filter pass back 16 bits of data. */
+	force_sig_seccomp(this_syscall, data, force_coredump);
+}
+
 static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 {
 	u32 filter_ret, action;
@@ -1285,10 +1293,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 		goto skip;
 
 	case SECCOMP_RET_TRAP:
-		/* Show the handler the original registers. */
-		syscall_rollback(current, current_pt_regs());
-		/* Let the filter pass back 16 bits of data. */
-		force_sig_seccomp(this_syscall, data, false);
+		seccomp_nack_syscall(this_syscall, data, false);
 		goto skip;
 
 	case SECCOMP_RET_TRACE:
@@ -1360,10 +1365,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 		/* Dump core only if this is the last remaining thread. */
 		if (action != SECCOMP_RET_KILL_THREAD ||
 		    (atomic_read(&current->signal->live) == 1)) {
-			/* Show the original registers in the dump. */
-			syscall_rollback(current, current_pt_regs());
-			/* Trigger a coredump with SIGSYS */
-			force_sig_seccomp(this_syscall, data, true);
+			seccomp_nack_syscall(this_syscall, data, true);
 		} else {
 			do_exit(SIGSYS);
 		}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 0/2] seccomp: drop syscall exit events for rejected syscalls
@ 2026-04-19 15:52 Oleg Nesterov
  2026-04-19 15:53 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
  2026-04-19 15:53 ` [RFC PATCH 2/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
  0 siblings, 2 replies; 4+ messages in thread
From: Oleg Nesterov @ 2026-04-19 15:52 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Will Drewry
  Cc: Eric Paris, Kusaram Devineni, Max Ver, Paul Moore, audit,
	linux-kernel

On top of [tip: core/urgent] entry: Kill ARCH_SYSCALL_WORK_{ENTER,EXIT}
https://git.kernel.org/tip/7b41ff29c8d386257bae62ad557fd6bad8cc6787

Still RFC, please comment...

Of course this is a user-visible behavior change. ptrace and audit
will no longer see the paired exit events for syscalls rejected with
force_sig_seccomp(). I _hope_ this is fine and better than what we
have now. If this is not acceptable, we can return to "seccomp: defer
syscall_rollback() to get_signal()" we discussed before.

2/2 currently ignores !CONFIG_GENERIC_ENTRY architectures. If this
approach is accepted, it will be simple to update them one-by-one
to sync with the CONFIG_GENERIC_ENTRY case.

Oleg.
---

 include/linux/entry-common.h |  9 ++++++++-
 include/linux/thread_info.h  |  2 ++
 kernel/seccomp.c             | 22 ++++++++++++++--------
 3 files changed, 24 insertions(+), 9 deletions(-)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper
  2026-04-19 15:52 [RFC PATCH 0/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
@ 2026-04-19 15:53 ` Oleg Nesterov
  2026-04-19 15:53 ` [RFC PATCH 2/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
  1 sibling, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2026-04-19 15:53 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Will Drewry
  Cc: Eric Paris, Kusaram Devineni, Max Ver, Paul Moore, audit,
	linux-kernel

To factor out the code and simplify the next change

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/seccomp.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 066909393c38..cb8dd78791cd 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1256,6 +1256,14 @@ static int seccomp_do_user_notification(int this_syscall,
 	return -1;
 }
 
+static void seccomp_nack_syscall(int this_syscall, int data, bool force_coredump)
+{
+	/* Show the handler or coredump the original registers. */
+	syscall_rollback(current, current_pt_regs());
+	/* Let the filter pass back 16 bits of data. */
+	force_sig_seccomp(this_syscall, data, force_coredump);
+}
+
 static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 {
 	u32 filter_ret, action;
@@ -1285,10 +1293,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 		goto skip;
 
 	case SECCOMP_RET_TRAP:
-		/* Show the handler the original registers. */
-		syscall_rollback(current, current_pt_regs());
-		/* Let the filter pass back 16 bits of data. */
-		force_sig_seccomp(this_syscall, data, false);
+		seccomp_nack_syscall(this_syscall, data, false);
 		goto skip;
 
 	case SECCOMP_RET_TRACE:
@@ -1360,10 +1365,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 		/* Dump core only if this is the last remaining thread. */
 		if (action != SECCOMP_RET_KILL_THREAD ||
 		    (atomic_read(&current->signal->live) == 1)) {
-			/* Show the original registers in the dump. */
-			syscall_rollback(current, current_pt_regs());
-			/* Trigger a coredump with SIGSYS */
-			force_sig_seccomp(this_syscall, data, true);
+			seccomp_nack_syscall(this_syscall, data, true);
 		} else {
 			do_exit(SIGSYS);
 		}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 2/2] seccomp: drop syscall exit events for rejected syscalls
  2026-04-19 15:52 [RFC PATCH 0/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
  2026-04-19 15:53 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
@ 2026-04-19 15:53 ` Oleg Nesterov
  1 sibling, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2026-04-19 15:53 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook, Peter Zijlstra, Thomas Gleixner,
	Will Drewry
  Cc: Eric Paris, Kusaram Devineni, Max Ver, Paul Moore, audit,
	linux-kernel

seccomp_nack_syscall() calls syscall_rollback(), which means that the
syscall exit path sees the original syscall number as the return value.

This confuses audit_syscall_exit(), trace_syscall_exit(), and ptrace,
causing them to report completely bogus syscall exit events.

Add a new SYSCALL_WORK_SECCOMP_EXIT flag set by seccomp_nack_syscall(),
and change syscall_exit_work() to return early if this flag is set. After
all, this syscall was never actually executed.

Note that syscall_exit_work() has to clear SYSCALL_WORK_SECCOMP_EXIT for
the !force_coredump case, and that is why we actually need the new flag:
seccomp_nack_syscall() can't just clear SYSCALL_AUDIT/TRACEPOINT/TRACE.

Reported-by: Max Ver <dudududumaxver@gmail.com>
Closes: https://lore.kernel.org/all/CABjJbFJO+p3jA1r0gjUZrCepQb1Fab3kqxYhc_PSfoqo21ypeQ@mail.gmail.com/
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/entry-common.h | 9 ++++++++-
 include/linux/thread_info.h  | 2 ++
 kernel/seccomp.c             | 4 ++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 535da46c3ee9..403802eed387 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -34,7 +34,8 @@
 				 SYSCALL_WORK_SYSCALL_TRACE |		\
 				 SYSCALL_WORK_SYSCALL_AUDIT |		\
 				 SYSCALL_WORK_SYSCALL_USER_DISPATCH |	\
-				 SYSCALL_WORK_SYSCALL_EXIT_TRAP)
+				 SYSCALL_WORK_SYSCALL_EXIT_TRAP |	\
+				 SYSCALL_WORK_SECCOMP_EXIT)
 
 /**
  * arch_ptrace_report_syscall_entry - Architecture specific ptrace_report_syscall_entry() wrapper
@@ -235,6 +236,12 @@ static __always_inline void syscall_exit_work(struct pt_regs *regs, unsigned lon
 		}
 	}
 
+	if (work & SYSCALL_WORK_SECCOMP_EXIT) {
+		/* Rejected by seccomp, no valid syscall exit state */
+		clear_syscall_work(SECCOMP_EXIT);
+		return;
+	}
+
 	audit_syscall_exit(regs);
 
 	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 051e42902690..167c850ae16e 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -40,6 +40,7 @@ enum {
 #ifdef CONFIG_GENERIC_ENTRY
 enum syscall_work_bit {
 	SYSCALL_WORK_BIT_SECCOMP,
+	SYSCALL_WORK_BIT_SECCOMP_EXIT,
 	SYSCALL_WORK_BIT_SYSCALL_TRACEPOINT,
 	SYSCALL_WORK_BIT_SYSCALL_TRACE,
 	SYSCALL_WORK_BIT_SYSCALL_EMU,
@@ -50,6 +51,7 @@ enum syscall_work_bit {
 };
 
 #define SYSCALL_WORK_SECCOMP			BIT(SYSCALL_WORK_BIT_SECCOMP)
+#define SYSCALL_WORK_SECCOMP_EXIT		BIT(SYSCALL_WORK_BIT_SECCOMP_EXIT)
 #define SYSCALL_WORK_SYSCALL_TRACEPOINT		BIT(SYSCALL_WORK_BIT_SYSCALL_TRACEPOINT)
 #define SYSCALL_WORK_SYSCALL_TRACE		BIT(SYSCALL_WORK_BIT_SYSCALL_TRACE)
 #define SYSCALL_WORK_SYSCALL_EMU		BIT(SYSCALL_WORK_BIT_SYSCALL_EMU)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index cb8dd78791cd..35703dceb6d2 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1262,6 +1262,10 @@ static void seccomp_nack_syscall(int this_syscall, int data, bool force_coredump
 	syscall_rollback(current, current_pt_regs());
 	/* Let the filter pass back 16 bits of data. */
 	force_sig_seccomp(this_syscall, data, force_coredump);
+#ifdef CONFIG_GENERIC_ENTRY
+	/* No valid syscall exit state after syscall_rollback() */
+	set_syscall_work(SECCOMP_EXIT);
+#endif
 }
 
 static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-19 15:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-19 15:52 [RFC PATCH 0/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
2026-04-19 15:53 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov
2026-04-19 15:53 ` [RFC PATCH 2/2] seccomp: drop syscall exit events for rejected syscalls Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2026-04-14 16:47 [RFC PATCH 0/2] seccomp: defer syscall_rollback() to get_signal() Oleg Nesterov
2026-04-14 16:48 ` [RFC PATCH 1/2] seccomp: introduce seccomp_nack_syscall() helper Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox