linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/6] seccomp: support nested listeners
@ 2025-12-01 12:23 Alexander Mikhalitsyn
  2025-12-01 12:23 ` [PATCH v1 1/6] seccomp: remove unused argument from seccomp_do_user_notification Alexander Mikhalitsyn
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Alexander Mikhalitsyn @ 2025-12-01 12:23 UTC (permalink / raw)
  To: kees
  Cc: linux-doc, linux-kernel, linux-kselftest, bpf, Andy Lutomirski,
	Will Drewry, Jonathan Corbet, Shuah Khan, Tycho Andersen,
	Andrei Vagin, Christian Brauner, Stéphane Graber

Dear friends,

this patch series adds support for nested seccomp listeners. It allows container
runtimes and other sandboxing software to install seccomp listeners on top of
existing ones, which is useful for nested LXC containers and other similar use-cases.

I decided to go with conservative approach and limit the maximum number of nested listeners
to 8 per seccomp filter chain (MAX_LISTENERS_PER_PATH). This is done to avoid dynamic memory
allocations in the very hot __seccomp_filter() function, where we use a preallocated static
array on the stack to track matched listeners. 8 nested listeners should be enough for
almost any practical scenarios.

Expecting potential discussions around this patch series, I'm going to present a talk
at LPC 2025 about the design and implementation details of this feature [1].

Git tree (based on for-next/seccomp):
v1: https://github.com/mihalicyn/linux/commits/seccomp.mult.listeners.v1
current: https://github.com/mihalicyn/linux/commits/seccomp.mult.listeners

Link: https://lpc.events/event/19/contributions/2241/ [1]

Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: Kees Cook <kees@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Stéphane Graber <stgraber@stgraber.org>

Alexander Mikhalitsyn (6):
  seccomp: remove unused argument from seccomp_do_user_notification
  seccomp: prepare seccomp_run_filters() to support more than one
    listener
  seccomp: limit number of listeners in seccomp tree
  seccomp: handle multiple listeners case
  seccomp: relax has_duplicate_listeners check
  tools/testing/selftests/seccomp: test nested listeners

 .../userspace-api/seccomp_filter.rst          |   6 +
 include/linux/seccomp.h                       |   3 +-
 include/uapi/linux/seccomp.h                  |  13 +-
 kernel/seccomp.c                              |  99 +++++++++--
 tools/include/uapi/linux/seccomp.h            |  13 +-
 tools/testing/selftests/seccomp/seccomp_bpf.c | 162 ++++++++++++++++++
 6 files changed, 269 insertions(+), 27 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 1/6] seccomp: remove unused argument from seccomp_do_user_notification
  2025-12-01 12:23 [PATCH v1 0/6] seccomp: support nested listeners Alexander Mikhalitsyn
@ 2025-12-01 12:23 ` Alexander Mikhalitsyn
  2025-12-01 14:19   ` Tycho Andersen
  2025-12-01 12:23 ` [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener Alexander Mikhalitsyn
  2025-12-01 12:24 ` [PATCH v1 5/6] seccomp: relax has_duplicate_listeners check Alexander Mikhalitsyn
  2 siblings, 1 reply; 11+ messages in thread
From: Alexander Mikhalitsyn @ 2025-12-01 12:23 UTC (permalink / raw)
  To: kees
  Cc: linux-doc, linux-kernel, Andy Lutomirski, Will Drewry,
	Jonathan Corbet, Shuah Khan, Tycho Andersen, Andrei Vagin,
	Christian Brauner, Stéphane Graber, Alexander Mikhalitsyn

Remove unused this_syscall argument from seccomp_do_user_notification()
and add kdoc for it.

Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Kees Cook <kees@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Stéphane Graber <stgraber@stgraber.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 kernel/seccomp.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 3bbfba30a777..f944ea5a2716 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1142,8 +1142,18 @@ static bool should_sleep_killable(struct seccomp_filter *match,
 	return match->wait_killable_recv && n->state >= SECCOMP_NOTIFY_SENT;
 }
 
-static int seccomp_do_user_notification(int this_syscall,
-					struct seccomp_filter *match,
+/**
+ * seccomp_do_user_notification - sends seccomp notification to the userspace
+ * listener and waits for a reply.
+ * @match: seccomp filter we are notifying
+ * @sd: seccomp data (syscall_nr, args, etc) to be passed to the userspace listener
+ *
+ * Returns
+ *   - -1 on success if userspace provided a reply for the syscall,
+ *   - -1 on interrupted wait,
+ *   - 0  on success if userspace requested to continue the syscall
+ */
+static int seccomp_do_user_notification(struct seccomp_filter *match,
 					const struct seccomp_data *sd)
 {
 	int err;
@@ -1317,7 +1327,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 		return 0;
 
 	case SECCOMP_RET_USER_NOTIF:
-		if (seccomp_do_user_notification(this_syscall, match, &sd))
+		if (seccomp_do_user_notification(match, &sd))
 			goto skip;
 
 		return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener
  2025-12-01 12:23 [PATCH v1 0/6] seccomp: support nested listeners Alexander Mikhalitsyn
  2025-12-01 12:23 ` [PATCH v1 1/6] seccomp: remove unused argument from seccomp_do_user_notification Alexander Mikhalitsyn
@ 2025-12-01 12:23 ` Alexander Mikhalitsyn
  2025-12-01 14:24   ` Tycho Andersen
  2025-12-02 20:26   ` Kees Cook
  2025-12-01 12:24 ` [PATCH v1 5/6] seccomp: relax has_duplicate_listeners check Alexander Mikhalitsyn
  2 siblings, 2 replies; 11+ messages in thread
From: Alexander Mikhalitsyn @ 2025-12-01 12:23 UTC (permalink / raw)
  To: kees
  Cc: linux-doc, linux-kernel, Andy Lutomirski, Will Drewry,
	Jonathan Corbet, Shuah Khan, Tycho Andersen, Andrei Vagin,
	Christian Brauner, Stéphane Graber, Alexander Mikhalitsyn

Prepare seccomp_run_filters() function to support more than one listener
in the seccomp tree. In this patch, we only introduce a new
struct seccomp_filter_matches with kdoc and modify seccomp_run_filters()
signature correspondingly.

No functional change intended.

Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Kees Cook <kees@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Stéphane Graber <stgraber@stgraber.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 kernel/seccomp.c | 35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index f944ea5a2716..c9a1062a53bd 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -237,6 +237,9 @@ struct seccomp_filter {
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
+/* Limit number of listeners through the tree. */
+#define MAX_LISTENERS_PER_PATH 8
+
 /*
  * Endianness is explicitly ignored and left for BPF program authors to manage
  * as per the specific architecture.
@@ -391,18 +394,38 @@ static inline bool seccomp_cache_check_allow(const struct seccomp_filter *sfilte
 }
 #endif /* SECCOMP_ARCH_NATIVE */
 
+/**
+ * struct seccomp_filter_matches - container for seccomp filter match results
+ *
+ * @n: A number of filters matched.
+ * @filters: An array of (struct seccomp_filter) pointers.
+ *	     Holds pointers to filters that matched during evaluation.
+ *	     A first one in the array is the one with the least permissive
+ *	     action result.
+ *
+ * If final action result is less (or more) permissive than SECCOMP_RET_USER_NOTIF,
+ * only the most restrictive filter is stored in the array's first element.
+ * If final action result is SECCOMP_RET_USER_NOTIF, we need to track
+ * all filters that resulted in the same action to support multiple listeners
+ * in seccomp tree.
+ */
+struct seccomp_filter_matches {
+	unsigned char n;
+	struct seccomp_filter *filters[MAX_LISTENERS_PER_PATH];
+};
+
 #define ACTION_ONLY(ret) ((s32)((ret) & (SECCOMP_RET_ACTION_FULL)))
 /**
  * seccomp_run_filters - evaluates all seccomp filters against @sd
  * @sd: optional seccomp data to be passed to filters
- * @match: stores struct seccomp_filter that resulted in the return value,
+ * @matches: array of struct seccomp_filter pointers that resulted in the return value,
  *         unless filter returned SECCOMP_RET_ALLOW, in which case it will
  *         be unchanged.
  *
  * Returns valid seccomp BPF response codes.
  */
 static u32 seccomp_run_filters(const struct seccomp_data *sd,
-			       struct seccomp_filter **match)
+			       struct seccomp_filter_matches *matches)
 {
 	u32 ret = SECCOMP_RET_ALLOW;
 	/* Make sure cross-thread synced filter points somewhere sane. */
@@ -425,7 +448,8 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
 
 		if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
 			ret = cur_ret;
-			*match = f;
+			matches->n = 1;
+			matches->filters[0] = f;
 		}
 	}
 	return ret;
@@ -1252,6 +1276,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 {
 	u32 filter_ret, action;
 	struct seccomp_data sd;
+	struct seccomp_filter_matches matches = {};
 	struct seccomp_filter *match = NULL;
 	int data;
 
@@ -1263,7 +1288,9 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 
 	populate_seccomp_data(&sd);
 
-	filter_ret = seccomp_run_filters(&sd, &match);
+	filter_ret = seccomp_run_filters(&sd, &matches);
+
+	match = matches.filters[0];
 	data = filter_ret & SECCOMP_RET_DATA;
 	action = filter_ret & SECCOMP_RET_ACTION_FULL;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 5/6] seccomp: relax has_duplicate_listeners check
  2025-12-01 12:23 [PATCH v1 0/6] seccomp: support nested listeners Alexander Mikhalitsyn
  2025-12-01 12:23 ` [PATCH v1 1/6] seccomp: remove unused argument from seccomp_do_user_notification Alexander Mikhalitsyn
  2025-12-01 12:23 ` [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener Alexander Mikhalitsyn
@ 2025-12-01 12:24 ` Alexander Mikhalitsyn
  2 siblings, 0 replies; 11+ messages in thread
From: Alexander Mikhalitsyn @ 2025-12-01 12:24 UTC (permalink / raw)
  To: kees
  Cc: linux-doc, linux-kernel, bpf, Andy Lutomirski, Will Drewry,
	Jonathan Corbet, Shuah Khan, Tycho Andersen, Andrei Vagin,
	Christian Brauner, Stéphane Graber, Alexander Mikhalitsyn,
	Alexander Mikhalitsyn

Now everything is ready to get rid of "only one listener per tree"
limitation.

Let's introduce a new uAPI flag
SECCOMP_FILTER_FLAG_ALLOW_NESTED_LISTENERS, so userspace may explicitly
allow nested listeners when installing a listener.

Note, that to install n-th listener, this flag must be set on all
the listeners up the tree.

Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: Kees Cook <kees@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Stéphane Graber <stgraber@stgraber.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 .../userspace-api/seccomp_filter.rst          |  6 +++++
 include/linux/seccomp.h                       |  3 ++-
 include/uapi/linux/seccomp.h                  | 13 ++++++-----
 kernel/seccomp.c                              | 22 +++++++++++++++----
 tools/include/uapi/linux/seccomp.h            | 13 ++++++-----
 5 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst
index cff0fa7f3175..b9633ab1ed47 100644
--- a/Documentation/userspace-api/seccomp_filter.rst
+++ b/Documentation/userspace-api/seccomp_filter.rst
@@ -210,6 +210,12 @@ notifications from both tasks will appear on the same filter fd. Reads and
 writes to/from a filter fd are also synchronized, so a filter fd can safely
 have many readers.
 
+By default, only one listener within seccomp filters tree is allowed. On attempt
+to add a new listener when one already exists in the filter tree, the
+``seccomp()`` call will fail with ``-EBUSY``. To allow multiple listeners, the
+``SECCOMP_FILTER_FLAG_ALLOW_NESTED_LISTENERS`` flag can be passed in addition to
+the ``SECCOMP_FILTER_FLAG_NEW_LISTENER`` flag.
+
 The interface for a seccomp notification fd consists of two structures:
 
 .. code-block:: c
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 9b959972bf4a..9b060946019d 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -10,7 +10,8 @@
 					 SECCOMP_FILTER_FLAG_SPEC_ALLOW | \
 					 SECCOMP_FILTER_FLAG_NEW_LISTENER | \
 					 SECCOMP_FILTER_FLAG_TSYNC_ESRCH | \
-					 SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV)
+					 SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV | \
+					 SECCOMP_FILTER_FLAG_ALLOW_NESTED_LISTENERS)
 
 /* sizeof() the first published struct seccomp_notif_addfd */
 #define SECCOMP_NOTIFY_ADDFD_SIZE_VER0 24
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index dbfc9b37fcae..de78d8e7a70b 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -18,13 +18,14 @@
 #define SECCOMP_GET_NOTIF_SIZES		3
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
-#define SECCOMP_FILTER_FLAG_TSYNC		(1UL << 0)
-#define SECCOMP_FILTER_FLAG_LOG			(1UL << 1)
-#define SECCOMP_FILTER_FLAG_SPEC_ALLOW		(1UL << 2)
-#define SECCOMP_FILTER_FLAG_NEW_LISTENER	(1UL << 3)
-#define SECCOMP_FILTER_FLAG_TSYNC_ESRCH		(1UL << 4)
+#define SECCOMP_FILTER_FLAG_TSYNC			(1UL << 0)
+#define SECCOMP_FILTER_FLAG_LOG				(1UL << 1)
+#define SECCOMP_FILTER_FLAG_SPEC_ALLOW			(1UL << 2)
+#define SECCOMP_FILTER_FLAG_NEW_LISTENER		(1UL << 3)
+#define SECCOMP_FILTER_FLAG_TSYNC_ESRCH			(1UL << 4)
 /* Received notifications wait in killable state (only respond to fatal signals) */
-#define SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV	(1UL << 5)
+#define SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV		(1UL << 5)
+#define SECCOMP_FILTER_FLAG_ALLOW_NESTED_LISTENERS	(1UL << 6)
 
 /*
  * All BPF programs must return a 32-bit value.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index ad733f849e0f..348e10d403b1 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -205,6 +205,7 @@ static inline void seccomp_cache_prepare(struct seccomp_filter *sfilter)
  * @log: true if all actions except for SECCOMP_RET_ALLOW should be logged
  * @wait_killable_recv: Put notifying process in killable state once the
  *			notification is received by the userspace listener.
+ * @allow_nested_listeners: Allow nested seccomp listeners.
  * @prev: points to a previously installed, or inherited, filter
  * @prog: the BPF program to evaluate
  * @notif: the struct that holds all notification related information
@@ -226,6 +227,7 @@ struct seccomp_filter {
 	refcount_t users;
 	bool log;
 	bool wait_killable_recv;
+	bool allow_nested_listeners;
 	struct action_cache cache;
 	struct seccomp_filter *prev;
 	struct bpf_prog *prog;
@@ -974,6 +976,10 @@ static long seccomp_attach_filter(unsigned int flags,
 	if (flags & SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV)
 		filter->wait_killable_recv = true;
 
+	/* Set nested listeners allow flag, if present. */
+	if (flags & SECCOMP_FILTER_FLAG_ALLOW_NESTED_LISTENERS)
+		filter->allow_nested_listeners = true;
+
 	/*
 	 * If there is an existing filter, make it the prev and don't drop its
 	 * task reference.
@@ -1955,7 +1961,8 @@ static struct file *init_listener(struct seccomp_filter *filter)
 }
 
 /*
- * Does @new_child have a listener while an ancestor also has a listener?
+ * Does @new_child have a listener while an ancestor also has a listener
+ * and hasn't allowed nesting?
  * If so, we'll want to reject this filter.
  * This only has to be tested for the current process, even in the TSYNC case,
  * because TSYNC installs @child with the same parent on all threads.
@@ -1973,7 +1980,12 @@ static bool has_duplicate_listener(struct seccomp_filter *new_child)
 		return false;
 	for (cur = current->seccomp.filter; cur; cur = cur->prev) {
 		if (cur->notif)
-			return true;
+			/*
+			 * We don't need to go up further, because if there is a
+			 * listener with nesting allowed, then all the listeners
+			 * up the tree have allowed nesting as well.
+			 */
+			return !cur->allow_nested_listeners;
 	}
 
 	return false;
@@ -2018,10 +2030,12 @@ static long seccomp_set_mode_filter(unsigned int flags,
 		return -EINVAL;
 
 	/*
-	 * The SECCOMP_FILTER_FLAG_WAIT_KILLABLE_SENT flag doesn't make sense
+	 * The SECCOMP_FILTER_FLAG_WAIT_KILLABLE_SENT and
+	 * SECCOMP_FILTER_FLAG_ALLOW_NESTED_LISTENERS flags don't make sense
 	 * without the SECCOMP_FILTER_FLAG_NEW_LISTENER flag.
 	 */
-	if ((flags & SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) &&
+	if (((flags & SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) ||
+	     (flags & SECCOMP_FILTER_FLAG_ALLOW_NESTED_LISTENERS)) &&
 	    ((flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) == 0))
 		return -EINVAL;
 
diff --git a/tools/include/uapi/linux/seccomp.h b/tools/include/uapi/linux/seccomp.h
index dbfc9b37fcae..de78d8e7a70b 100644
--- a/tools/include/uapi/linux/seccomp.h
+++ b/tools/include/uapi/linux/seccomp.h
@@ -18,13 +18,14 @@
 #define SECCOMP_GET_NOTIF_SIZES		3
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
-#define SECCOMP_FILTER_FLAG_TSYNC		(1UL << 0)
-#define SECCOMP_FILTER_FLAG_LOG			(1UL << 1)
-#define SECCOMP_FILTER_FLAG_SPEC_ALLOW		(1UL << 2)
-#define SECCOMP_FILTER_FLAG_NEW_LISTENER	(1UL << 3)
-#define SECCOMP_FILTER_FLAG_TSYNC_ESRCH		(1UL << 4)
+#define SECCOMP_FILTER_FLAG_TSYNC			(1UL << 0)
+#define SECCOMP_FILTER_FLAG_LOG				(1UL << 1)
+#define SECCOMP_FILTER_FLAG_SPEC_ALLOW			(1UL << 2)
+#define SECCOMP_FILTER_FLAG_NEW_LISTENER		(1UL << 3)
+#define SECCOMP_FILTER_FLAG_TSYNC_ESRCH			(1UL << 4)
 /* Received notifications wait in killable state (only respond to fatal signals) */
-#define SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV	(1UL << 5)
+#define SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV		(1UL << 5)
+#define SECCOMP_FILTER_FLAG_ALLOW_NESTED_LISTENERS	(1UL << 6)
 
 /*
  * All BPF programs must return a 32-bit value.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/6] seccomp: remove unused argument from seccomp_do_user_notification
  2025-12-01 12:23 ` [PATCH v1 1/6] seccomp: remove unused argument from seccomp_do_user_notification Alexander Mikhalitsyn
@ 2025-12-01 14:19   ` Tycho Andersen
  2025-12-02 11:56     ` Aleksandr Mikhalitsyn
  0 siblings, 1 reply; 11+ messages in thread
From: Tycho Andersen @ 2025-12-01 14:19 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: kees, linux-doc, linux-kernel, Andy Lutomirski, Will Drewry,
	Jonathan Corbet, Shuah Khan, Andrei Vagin, Christian Brauner,
	Stéphane Graber

On Mon, Dec 01, 2025 at 01:23:58PM +0100, Alexander Mikhalitsyn wrote:
> Remove unused this_syscall argument from seccomp_do_user_notification()
> and add kdoc for it.

heh, looks like this was unused since the original commit, whoops.

Reviewed-by: Tycho Andersen (AMD) <tycho@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener
  2025-12-01 12:23 ` [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener Alexander Mikhalitsyn
@ 2025-12-01 14:24   ` Tycho Andersen
  2025-12-02 11:58     ` Aleksandr Mikhalitsyn
  2025-12-02 20:26   ` Kees Cook
  1 sibling, 1 reply; 11+ messages in thread
From: Tycho Andersen @ 2025-12-01 14:24 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: kees, linux-doc, linux-kernel, Andy Lutomirski, Will Drewry,
	Jonathan Corbet, Shuah Khan, Andrei Vagin, Christian Brauner,
	Stéphane Graber

On Mon, Dec 01, 2025 at 01:23:59PM +0100, Alexander Mikhalitsyn wrote:
> +/**
> + * struct seccomp_filter_matches - container for seccomp filter match results
> + *
> + * @n: A number of filters matched.
> + * @filters: An array of (struct seccomp_filter) pointers.
> + *	     Holds pointers to filters that matched during evaluation.
> + *	     A first one in the array is the one with the least permissive
> + *	     action result.
> + *
> + * If final action result is less (or more) permissive than SECCOMP_RET_USER_NOTIF,
> + * only the most restrictive filter is stored in the array's first element.
> + * If final action result is SECCOMP_RET_USER_NOTIF, we need to track
> + * all filters that resulted in the same action to support multiple listeners
> + * in seccomp tree.
> + */
> +struct seccomp_filter_matches {
> +	unsigned char n;
> +	struct seccomp_filter *filters[MAX_LISTENERS_PER_PATH];

Maybe a __counted_by() for this?

Tycho

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/6] seccomp: remove unused argument from seccomp_do_user_notification
  2025-12-01 14:19   ` Tycho Andersen
@ 2025-12-02 11:56     ` Aleksandr Mikhalitsyn
  0 siblings, 0 replies; 11+ messages in thread
From: Aleksandr Mikhalitsyn @ 2025-12-02 11:56 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: kees, linux-doc, linux-kernel, Andy Lutomirski, Will Drewry,
	Jonathan Corbet, Shuah Khan, Andrei Vagin, Christian Brauner,
	Stéphane Graber

On Mon, Dec 1, 2025 at 3:19 PM Tycho Andersen <tycho@kernel.org> wrote:
>
> On Mon, Dec 01, 2025 at 01:23:58PM +0100, Alexander Mikhalitsyn wrote:
> > Remove unused this_syscall argument from seccomp_do_user_notification()
> > and add kdoc for it.
>
> heh, looks like this was unused since the original commit, whoops.
>
> Reviewed-by: Tycho Andersen (AMD) <tycho@kernel.org>

Added in v2. Thanks, Tycho! ;-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener
  2025-12-01 14:24   ` Tycho Andersen
@ 2025-12-02 11:58     ` Aleksandr Mikhalitsyn
  2025-12-02 14:06       ` Tycho Andersen
  0 siblings, 1 reply; 11+ messages in thread
From: Aleksandr Mikhalitsyn @ 2025-12-02 11:58 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: kees, linux-doc, linux-kernel, Andy Lutomirski, Will Drewry,
	Jonathan Corbet, Shuah Khan, Andrei Vagin, Christian Brauner,
	Stéphane Graber

On Mon, Dec 1, 2025 at 3:24 PM Tycho Andersen <tycho@kernel.org> wrote:
>
> On Mon, Dec 01, 2025 at 01:23:59PM +0100, Alexander Mikhalitsyn wrote:
> > +/**
> > + * struct seccomp_filter_matches - container for seccomp filter match results
> > + *
> > + * @n: A number of filters matched.
> > + * @filters: An array of (struct seccomp_filter) pointers.
> > + *        Holds pointers to filters that matched during evaluation.
> > + *        A first one in the array is the one with the least permissive
> > + *        action result.
> > + *
> > + * If final action result is less (or more) permissive than SECCOMP_RET_USER_NOTIF,
> > + * only the most restrictive filter is stored in the array's first element.
> > + * If final action result is SECCOMP_RET_USER_NOTIF, we need to track
> > + * all filters that resulted in the same action to support multiple listeners
> > + * in seccomp tree.
> > + */
> > +struct seccomp_filter_matches {
> > +     unsigned char n;
> > +     struct seccomp_filter *filters[MAX_LISTENERS_PER_PATH];
>
> Maybe a __counted_by() for this?

I thought that __counted_by() only makes sense for flex arrays, while
in this case we have a static array.

Kind regards,
Alex

>
> Tycho

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener
  2025-12-02 11:58     ` Aleksandr Mikhalitsyn
@ 2025-12-02 14:06       ` Tycho Andersen
  0 siblings, 0 replies; 11+ messages in thread
From: Tycho Andersen @ 2025-12-02 14:06 UTC (permalink / raw)
  To: Aleksandr Mikhalitsyn
  Cc: kees, linux-doc, linux-kernel, Andy Lutomirski, Will Drewry,
	Jonathan Corbet, Shuah Khan, Andrei Vagin, Christian Brauner,
	Stéphane Graber

On Tue, Dec 02, 2025 at 12:58:14PM +0100, Aleksandr Mikhalitsyn wrote:
> On Mon, Dec 1, 2025 at 3:24 PM Tycho Andersen <tycho@kernel.org> wrote:
> >
> > On Mon, Dec 01, 2025 at 01:23:59PM +0100, Alexander Mikhalitsyn wrote:
> > > +/**
> > > + * struct seccomp_filter_matches - container for seccomp filter match results
> > > + *
> > > + * @n: A number of filters matched.
> > > + * @filters: An array of (struct seccomp_filter) pointers.
> > > + *        Holds pointers to filters that matched during evaluation.
> > > + *        A first one in the array is the one with the least permissive
> > > + *        action result.
> > > + *
> > > + * If final action result is less (or more) permissive than SECCOMP_RET_USER_NOTIF,
> > > + * only the most restrictive filter is stored in the array's first element.
> > > + * If final action result is SECCOMP_RET_USER_NOTIF, we need to track
> > > + * all filters that resulted in the same action to support multiple listeners
> > > + * in seccomp tree.
> > > + */
> > > +struct seccomp_filter_matches {
> > > +     unsigned char n;
> > > +     struct seccomp_filter *filters[MAX_LISTENERS_PER_PATH];
> >
> > Maybe a __counted_by() for this?
> 
> I thought that __counted_by() only makes sense for flex arrays, while
> in this case we have a static array.

Oh, duh, you're right of course.

Tycho

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener
  2025-12-01 12:23 ` [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener Alexander Mikhalitsyn
  2025-12-01 14:24   ` Tycho Andersen
@ 2025-12-02 20:26   ` Kees Cook
  2025-12-03 15:25     ` Aleksandr Mikhalitsyn
  1 sibling, 1 reply; 11+ messages in thread
From: Kees Cook @ 2025-12-02 20:26 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: linux-doc, linux-kernel, Andy Lutomirski, Will Drewry,
	Jonathan Corbet, Shuah Khan, Tycho Andersen, Andrei Vagin,
	Christian Brauner, Stéphane Graber

On Mon, Dec 01, 2025 at 01:23:59PM +0100, Alexander Mikhalitsyn wrote:
> Prepare seccomp_run_filters() function to support more than one listener
> in the seccomp tree. In this patch, we only introduce a new
> struct seccomp_filter_matches with kdoc and modify seccomp_run_filters()
> signature correspondingly.
> 
> No functional change intended.
> 
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Kees Cook <kees@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Will Drewry <wad@chromium.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Tycho Andersen <tycho@tycho.pizza>
> Cc: Andrei Vagin <avagin@gmail.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Stéphane Graber <stgraber@stgraber.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
>  kernel/seccomp.c | 35 +++++++++++++++++++++++++++++++----
>  1 file changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index f944ea5a2716..c9a1062a53bd 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -237,6 +237,9 @@ struct seccomp_filter {
>  /* Limit any path through the tree to 256KB worth of instructions. */
>  #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
>  
> +/* Limit number of listeners through the tree. */
> +#define MAX_LISTENERS_PER_PATH 8
> +
>  /*
>   * Endianness is explicitly ignored and left for BPF program authors to manage
>   * as per the specific architecture.
> @@ -391,18 +394,38 @@ static inline bool seccomp_cache_check_allow(const struct seccomp_filter *sfilte
>  }
>  #endif /* SECCOMP_ARCH_NATIVE */
>  
> +/**
> + * struct seccomp_filter_matches - container for seccomp filter match results
> + *
> + * @n: A number of filters matched.
> + * @filters: An array of (struct seccomp_filter) pointers.
> + *	     Holds pointers to filters that matched during evaluation.
> + *	     A first one in the array is the one with the least permissive
> + *	     action result.
> + *
> + * If final action result is less (or more) permissive than SECCOMP_RET_USER_NOTIF,
> + * only the most restrictive filter is stored in the array's first element.
> + * If final action result is SECCOMP_RET_USER_NOTIF, we need to track
> + * all filters that resulted in the same action to support multiple listeners
> + * in seccomp tree.
> + */
> +struct seccomp_filter_matches {
> +	unsigned char n;
> +	struct seccomp_filter *filters[MAX_LISTENERS_PER_PATH];
> +};
> +
>  #define ACTION_ONLY(ret) ((s32)((ret) & (SECCOMP_RET_ACTION_FULL)))
>  /**
>   * seccomp_run_filters - evaluates all seccomp filters against @sd
>   * @sd: optional seccomp data to be passed to filters
> - * @match: stores struct seccomp_filter that resulted in the return value,
> + * @matches: array of struct seccomp_filter pointers that resulted in the return value,
>   *         unless filter returned SECCOMP_RET_ALLOW, in which case it will
>   *         be unchanged.
>   *
>   * Returns valid seccomp BPF response codes.
>   */
>  static u32 seccomp_run_filters(const struct seccomp_data *sd,
> -			       struct seccomp_filter **match)
> +			       struct seccomp_filter_matches *matches)
>  {
>  	u32 ret = SECCOMP_RET_ALLOW;
>  	/* Make sure cross-thread synced filter points somewhere sane. */
> @@ -425,7 +448,8 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>  
>  		if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
>  			ret = cur_ret;
> -			*match = f;
> +			matches->n = 1;
> +			matches->filters[0] = f;
>  		}
>  	}
>  	return ret;
> @@ -1252,6 +1276,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
>  {
>  	u32 filter_ret, action;
>  	struct seccomp_data sd;
> +	struct seccomp_filter_matches matches = {};

I was surprised to see this didn't induce a stack protector check (due
to the array use). It does, however, expand the work done to clear local
variables (i.e. this adds 9 unsigned long zeroings to the default case).

Regardless, I'll read this thread more closely in time for the LPC
session; I'm not exactly opposed to allowing multiple listeners, but I
do want to meditate on the safety logic (which I see you've spent time
thinking about too).

Thanks!

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener
  2025-12-02 20:26   ` Kees Cook
@ 2025-12-03 15:25     ` Aleksandr Mikhalitsyn
  0 siblings, 0 replies; 11+ messages in thread
From: Aleksandr Mikhalitsyn @ 2025-12-03 15:25 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-doc, linux-kernel, Andy Lutomirski, Will Drewry,
	Jonathan Corbet, Shuah Khan, Tycho Andersen, Andrei Vagin,
	Christian Brauner, Stéphane Graber

On Tue, Dec 2, 2025 at 9:26 PM Kees Cook <kees@kernel.org> wrote:
>
> On Mon, Dec 01, 2025 at 01:23:59PM +0100, Alexander Mikhalitsyn wrote:
> > Prepare seccomp_run_filters() function to support more than one listener
> > in the seccomp tree. In this patch, we only introduce a new
> > struct seccomp_filter_matches with kdoc and modify seccomp_run_filters()
> > signature correspondingly.
> >
> > No functional change intended.
> >
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Kees Cook <kees@kernel.org>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Will Drewry <wad@chromium.org>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: Tycho Andersen <tycho@tycho.pizza>
> > Cc: Andrei Vagin <avagin@gmail.com>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Stéphane Graber <stgraber@stgraber.org>
> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > ---
> >  kernel/seccomp.c | 35 +++++++++++++++++++++++++++++++----
> >  1 file changed, 31 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index f944ea5a2716..c9a1062a53bd 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -237,6 +237,9 @@ struct seccomp_filter {
> >  /* Limit any path through the tree to 256KB worth of instructions. */
> >  #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
> >
> > +/* Limit number of listeners through the tree. */
> > +#define MAX_LISTENERS_PER_PATH 8
> > +
> >  /*
> >   * Endianness is explicitly ignored and left for BPF program authors to manage
> >   * as per the specific architecture.
> > @@ -391,18 +394,38 @@ static inline bool seccomp_cache_check_allow(const struct seccomp_filter *sfilte
> >  }
> >  #endif /* SECCOMP_ARCH_NATIVE */
> >
> > +/**
> > + * struct seccomp_filter_matches - container for seccomp filter match results
> > + *
> > + * @n: A number of filters matched.
> > + * @filters: An array of (struct seccomp_filter) pointers.
> > + *        Holds pointers to filters that matched during evaluation.
> > + *        A first one in the array is the one with the least permissive
> > + *        action result.
> > + *
> > + * If final action result is less (or more) permissive than SECCOMP_RET_USER_NOTIF,
> > + * only the most restrictive filter is stored in the array's first element.
> > + * If final action result is SECCOMP_RET_USER_NOTIF, we need to track
> > + * all filters that resulted in the same action to support multiple listeners
> > + * in seccomp tree.
> > + */
> > +struct seccomp_filter_matches {
> > +     unsigned char n;
> > +     struct seccomp_filter *filters[MAX_LISTENERS_PER_PATH];
> > +};
> > +
> >  #define ACTION_ONLY(ret) ((s32)((ret) & (SECCOMP_RET_ACTION_FULL)))
> >  /**
> >   * seccomp_run_filters - evaluates all seccomp filters against @sd
> >   * @sd: optional seccomp data to be passed to filters
> > - * @match: stores struct seccomp_filter that resulted in the return value,
> > + * @matches: array of struct seccomp_filter pointers that resulted in the return value,
> >   *         unless filter returned SECCOMP_RET_ALLOW, in which case it will
> >   *         be unchanged.
> >   *
> >   * Returns valid seccomp BPF response codes.
> >   */
> >  static u32 seccomp_run_filters(const struct seccomp_data *sd,
> > -                            struct seccomp_filter **match)
> > +                            struct seccomp_filter_matches *matches)
> >  {
> >       u32 ret = SECCOMP_RET_ALLOW;
> >       /* Make sure cross-thread synced filter points somewhere sane. */
> > @@ -425,7 +448,8 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> >
> >               if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
> >                       ret = cur_ret;
> > -                     *match = f;
> > +                     matches->n = 1;
> > +                     matches->filters[0] = f;
> >               }
> >       }
> >       return ret;
> > @@ -1252,6 +1276,7 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
> >  {
> >       u32 filter_ret, action;
> >       struct seccomp_data sd;
> > +     struct seccomp_filter_matches matches = {};
>

Hi Kees,

Thanks for looking into this stuff! ;)

> I was surprised to see this didn't induce a stack protector check (due
> to the array use). It does, however, expand the work done to clear local
> variables (i.e. this adds 9 unsigned long zeroings to the default case).

Actually, by saying this you've inspired me to look at this stuff with
a fresh mind again
and I have a feeling that I can probably make it work with just one additional
struct seccomp_filter pointer on the stack, instead of adding extra 8
pointers... :)
Let me make another iteration on this and send a -v3 then.

>I was surprised to see this didn't induce a stack protector check (due
> to the array use).

Also, sorry if my question is stupid, but what do you mean by stack
protector check in
this case? Just a check for an array index before writing to it? Or
something more generic?

>
> Regardless, I'll read this thread more closely in time for the LPC
> session; I'm not exactly opposed to allowing multiple listeners, but I
> do want to meditate on the safety logic (which I see you've spent time
> thinking about too).

Thanks, Kees!

>
> Thanks!

Kind regards,
Alex

>
> -Kees
>
> --
> Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-12-03 15:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-01 12:23 [PATCH v1 0/6] seccomp: support nested listeners Alexander Mikhalitsyn
2025-12-01 12:23 ` [PATCH v1 1/6] seccomp: remove unused argument from seccomp_do_user_notification Alexander Mikhalitsyn
2025-12-01 14:19   ` Tycho Andersen
2025-12-02 11:56     ` Aleksandr Mikhalitsyn
2025-12-01 12:23 ` [PATCH v1 2/6] seccomp: prepare seccomp_run_filters() to support more than one listener Alexander Mikhalitsyn
2025-12-01 14:24   ` Tycho Andersen
2025-12-02 11:58     ` Aleksandr Mikhalitsyn
2025-12-02 14:06       ` Tycho Andersen
2025-12-02 20:26   ` Kees Cook
2025-12-03 15:25     ` Aleksandr Mikhalitsyn
2025-12-01 12:24 ` [PATCH v1 5/6] seccomp: relax has_duplicate_listeners check Alexander Mikhalitsyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).