[PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV
@ 2024-05-23  1:45 Andrei Vagin
  2024-05-23  1:45 ` [PATCH 1/3] seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited Andrei Vagin
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Andrei Vagin @ 2024-05-23  1:45 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Will Drewry, Oleg Nesterov,
	Christian Brauner
  Cc: linux-kernel, Tycho Andersen, Andrei Vagin, Jens Axboe

This patch set addresses two problems with the SECCOMP_IOCTL_NOTIF_RECV
ioctl:
* it doesn't return when the seccomp filter becomes unused (all tasks
  have exited).
* EPOLLHUP is triggered not when a task exits, but rather when its zombie
  is collected.

v2: - Remove unnecessary checks of PF_EXITING.
    - Take siglock with disabling irqs.
    Thanks to Oleg for the review and the help with the first version.

Andrei Vagin (3):
  seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited
  seccomp: release task filters when the task exits
  selftests/seccomp: add test for NOTIF_RECV and unused filters

 kernel/exit.c                                 |  3 +-
 kernel/seccomp.c                              | 38 ++++++++++---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 54 +++++++++++++++++++
 3 files changed, 88 insertions(+), 7 deletions(-)

Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tycho Andersen <tandersen@netflix.com>


-- 
2.45.0.rc1.225.g2a3ae87e7f-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited
  2024-05-23  1:45 [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Andrei Vagin
@ 2024-05-23  1:45 ` Andrei Vagin
  2024-05-23  8:59   ` Oleg Nesterov
  2024-05-23  1:45 ` [PATCH 2/3] seccomp: release task filters when the task exits Andrei Vagin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Andrei Vagin @ 2024-05-23  1:45 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Will Drewry, Oleg Nesterov,
	Christian Brauner
  Cc: linux-kernel, Tycho Andersen, Andrei Vagin, Jens Axboe

SECCOMP_IOCTL_NOTIF_RECV promptly returns when a seccomp filter becomes
unused, as a filter without users can't trigger any events.

Previously, event listeners had to rely on epoll to detect when all
processes had exited.

The change is based on the 'commit 99cdb8b9a573 ("seccomp: notify about
unused filter")' which implemented (E)POLLHUP notifications.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrei Vagin <avagin@google.com>
---
 kernel/seccomp.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index f70e031e06a8..35435e8f1035 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1466,7 +1466,7 @@ static int recv_wake_function(wait_queue_entry_t *wait, unsigned int mode, int s
 				  void *key)
 {
 	/* Avoid a wakeup if event not interesting for us. */
-	if (key && !(key_to_poll(key) & (EPOLLIN | EPOLLERR)))
+	if (key && !(key_to_poll(key) & (EPOLLIN | EPOLLERR | EPOLLHUP)))
 		return 0;
 	return autoremove_wake_function(wait, mode, sync, key);
 }
@@ -1476,6 +1476,9 @@ static int recv_wait_event(struct seccomp_filter *filter)
 	DEFINE_WAIT_FUNC(wait, recv_wake_function);
 	int ret;
 
+	if (refcount_read(&filter->users) == 0)
+		return 0;
+
 	if (atomic_dec_if_positive(&filter->notif->requests) >= 0)
 		return 0;
 
@@ -1484,6 +1487,8 @@ static int recv_wait_event(struct seccomp_filter *filter)
 
 		if (atomic_dec_if_positive(&filter->notif->requests) >= 0)
 			break;
+		if (refcount_read(&filter->users) == 0)
+			break;
 
 		if (ret)
 			return ret;
-- 
2.45.1.288.g0e0cd299f1-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] seccomp: release task filters when the task exits
  2024-05-23  1:45 [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Andrei Vagin
  2024-05-23  1:45 ` [PATCH 1/3] seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited Andrei Vagin
@ 2024-05-23  1:45 ` Andrei Vagin
  2024-05-23  9:00   ` Oleg Nesterov
  2024-06-26 18:57   ` Kees Cook
  2024-05-23  1:45 ` [PATCH 3/3] selftests/seccomp: add test for NOTIF_RECV and unused filters Andrei Vagin
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 12+ messages in thread
From: Andrei Vagin @ 2024-05-23  1:45 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Will Drewry, Oleg Nesterov,
	Christian Brauner
  Cc: linux-kernel, Tycho Andersen, Andrei Vagin, Jens Axboe

Previously, seccomp filters were released in release_task(), which
required the process to exit and its zombie to be collected. However,
exited threads/processes can't trigger any seccomp events, making it
more logical to release filters upon task exits.

This adjustment simplifies scenarios where a parent is tracing its child
process. The parent process can now handle all events from a seccomp
listening descriptor and then call wait to collect a child zombie.

seccomp_filter_release takes the siglock to avoid races with
seccomp_sync_threads. There was an idea to bypass taking the lock by
checking PF_EXITING, but it can be set without holding siglock if
threads have SIGNAL_GROUP_EXIT. This means it can happen concurently
with seccomp_filter_release.

Signed-off-by: Andrei Vagin <avagin@google.com>
---
 kernel/exit.c    |  3 ++-
 kernel/seccomp.c | 22 ++++++++++++++++------
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 41a12630cbbc..23439c021d8d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -278,7 +278,6 @@ void release_task(struct task_struct *p)
 	}
 
 	write_unlock_irq(&tasklist_lock);
-	seccomp_filter_release(p);
 	proc_flush_pid(thread_pid);
 	put_pid(thread_pid);
 	release_thread(p);
@@ -836,6 +835,8 @@ void __noreturn do_exit(long code)
 	io_uring_files_cancel();
 	exit_signals(tsk);  /* sets PF_EXITING */
 
+	seccomp_filter_release(tsk);
+
 	acct_update_integrals(tsk);
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 35435e8f1035..67305e776dd3 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -502,6 +502,9 @@ static inline pid_t seccomp_can_sync_threads(void)
 		/* Skip current, since it is initiating the sync. */
 		if (thread == caller)
 			continue;
+		/* Skip exited threads. */
+		if (thread->flags & PF_EXITING)
+			continue;
 
 		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED ||
 		    (thread->seccomp.mode == SECCOMP_MODE_FILTER &&
@@ -563,18 +566,18 @@ static void __seccomp_filter_release(struct seccomp_filter *orig)
  * @tsk: task the filter should be released from.
  *
  * This function should only be called when the task is exiting as
- * it detaches it from its filter tree. As such, READ_ONCE() and
- * barriers are not needed here, as would normally be needed.
+ * it detaches it from its filter tree. PF_EXITING has to be set
+ * for the task.
  */
 void seccomp_filter_release(struct task_struct *tsk)
 {
-	struct seccomp_filter *orig = tsk->seccomp.filter;
-
-	/* We are effectively holding the siglock by not having any sighand. */
-	WARN_ON(tsk->sighand != NULL);
+	struct seccomp_filter *orig;
 
+	spin_lock_irq(&current->sighand->siglock);
+	orig = tsk->seccomp.filter;
 	/* Detach task from its filter tree. */
 	tsk->seccomp.filter = NULL;
+	spin_unlock_irq(&current->sighand->siglock);
 	__seccomp_filter_release(orig);
 }
 
@@ -602,6 +605,13 @@ static inline void seccomp_sync_threads(unsigned long flags)
 		if (thread == caller)
 			continue;
 
+		/*
+		 * Skip exited threads. seccomp_filter_release could have
+		 * been already called for this task.
+		 */
+		if (thread->flags & PF_EXITING)
+			continue;
+
 		/* Get a task reference for the new leaf node. */
 		get_seccomp_filter(caller);
 
-- 
2.45.1.288.g0e0cd299f1-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] selftests/seccomp: add test for NOTIF_RECV and unused filters
  2024-05-23  1:45 [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Andrei Vagin
  2024-05-23  1:45 ` [PATCH 1/3] seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited Andrei Vagin
  2024-05-23  1:45 ` [PATCH 2/3] seccomp: release task filters when the task exits Andrei Vagin
@ 2024-05-23  1:45 ` Andrei Vagin
  2024-05-23  9:33 ` [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Oleg Nesterov
  2024-06-25  0:17 ` Andrei Vagin
  4 siblings, 0 replies; 12+ messages in thread
From: Andrei Vagin @ 2024-05-23  1:45 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Will Drewry, Oleg Nesterov,
	Christian Brauner
  Cc: linux-kernel, Tycho Andersen, Andrei Vagin, Jens Axboe

Add a new test case to check that SECCOMP_IOCTL_NOTIF_RECV returns when all
tasks have gone.

Signed-off-by: Andrei Vagin <avagin@google.com>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 54 +++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 783ebce8c4de..390781d7c951 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -3954,6 +3954,60 @@ TEST(user_notification_filter_empty)
 	EXPECT_GT((pollfd.revents & POLLHUP) ?: 0, 0);
 }
 
+TEST(user_ioctl_notification_filter_empty)
+{
+	pid_t pid;
+	long ret;
+	int status, p[2];
+	struct __clone_args args = {
+		.flags = CLONE_FILES,
+		.exit_signal = SIGCHLD,
+	};
+	struct seccomp_notif req = {};
+
+	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+	}
+
+	if (__NR_clone3 < 0)
+		SKIP(return, "Test not built with clone3 support");
+
+	ASSERT_EQ(0, pipe(p));
+
+	pid = sys_clone3(&args, sizeof(args));
+	ASSERT_GE(pid, 0);
+
+	if (pid == 0) {
+		int listener;
+
+		listener = user_notif_syscall(__NR_mknodat, SECCOMP_FILTER_FLAG_NEW_LISTENER);
+		if (listener < 0)
+			_exit(EXIT_FAILURE);
+
+		if (dup2(listener, 200) != 200)
+			_exit(EXIT_FAILURE);
+		close(p[1]);
+		close(listener);
+		sleep(1);
+
+		_exit(EXIT_SUCCESS);
+	}
+	if (read(p[0], &status, 1) != 0)
+		_exit(EXIT_SUCCESS);
+	close(p[0]);
+	/*
+	 * The seccomp filter has become unused so we should be notified once
+	 * the kernel gets around to cleaning up task struct.
+	 */
+	EXPECT_EQ(ioctl(200, SECCOMP_IOCTL_NOTIF_RECV, &req), -1);
+	EXPECT_EQ(errno, ENOENT);
+
+	EXPECT_EQ(waitpid(pid, &status, 0), pid);
+	EXPECT_EQ(true, WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+}
+
 static void *do_thread(void *data)
 {
 	return NULL;
-- 
2.45.1.288.g0e0cd299f1-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited
  2024-05-23  1:45 ` [PATCH 1/3] seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited Andrei Vagin
@ 2024-05-23  8:59   ` Oleg Nesterov
  2024-05-24 17:47     ` Andrei Vagin
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2024-05-23  8:59 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Kees Cook, Andy Lutomirski, Will Drewry, Christian Brauner,
	linux-kernel, Tycho Andersen, Jens Axboe

Hi Andrei,

the patch looks good to me even if I don't really understand what
SECCOMP_IOCTL_NOTIF_RECV does. But let me ask a stupid question,

On 05/23, Andrei Vagin wrote:
>
> The change is based on the 'commit 99cdb8b9a573 ("seccomp: notify about
> unused filter")' which implemented (E)POLLHUP notifications.

To me this patch fixes the commit above, because without this change

> @@ -1466,7 +1466,7 @@ static int recv_wake_function(wait_queue_entry_t *wait, unsigned int mode, int s
>  				  void *key)
>  {
>  	/* Avoid a wakeup if event not interesting for us. */
> -	if (key && !(key_to_poll(key) & (EPOLLIN | EPOLLERR)))
> +	if (key && !(key_to_poll(key) & (EPOLLIN | EPOLLERR | EPOLLHUP)))

__seccomp_filter_orphan() -> wake_up_poll(&orig->wqh, EPOLLHUP) won't
wakeup the task sleeping in recv_wait_event(), right ?

In any case, FWIW

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] seccomp: release task filters when the task exits
  2024-05-23  1:45 ` [PATCH 2/3] seccomp: release task filters when the task exits Andrei Vagin
@ 2024-05-23  9:00   ` Oleg Nesterov
  2024-06-26 18:57   ` Kees Cook
  1 sibling, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2024-05-23  9:00 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Kees Cook, Andy Lutomirski, Will Drewry, Christian Brauner,
	linux-kernel, Tycho Andersen, Jens Axboe

On 05/23, Andrei Vagin wrote:
>
> Previously, seccomp filters were released in release_task(), which
> required the process to exit and its zombie to be collected. However,
> exited threads/processes can't trigger any seccomp events, making it
> more logical to release filters upon task exits.
>
> This adjustment simplifies scenarios where a parent is tracing its child
> process. The parent process can now handle all events from a seccomp
> listening descriptor and then call wait to collect a child zombie.
>
> seccomp_filter_release takes the siglock to avoid races with
> seccomp_sync_threads. There was an idea to bypass taking the lock by
> checking PF_EXITING, but it can be set without holding siglock if
> threads have SIGNAL_GROUP_EXIT. This means it can happen concurently
> with seccomp_filter_release.
>
> Signed-off-by: Andrei Vagin <avagin@google.com>
> ---
>  kernel/exit.c    |  3 ++-
>  kernel/seccomp.c | 22 ++++++++++++++++------
>  2 files changed, 18 insertions(+), 7 deletions(-)

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV
  2024-05-23  1:45 [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Andrei Vagin
                   ` (2 preceding siblings ...)
  2024-05-23  1:45 ` [PATCH 3/3] selftests/seccomp: add test for NOTIF_RECV and unused filters Andrei Vagin
@ 2024-05-23  9:33 ` Oleg Nesterov
  2024-06-25  0:19   ` Andrei Vagin
  2024-06-25  0:17 ` Andrei Vagin
  4 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2024-05-23  9:33 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Kees Cook, Andy Lutomirski, Will Drewry, Christian Brauner,
	linux-kernel, Tycho Andersen, Jens Axboe

On 05/23, Andrei Vagin wrote:
>
> This patch set addresses two problems with the SECCOMP_IOCTL_NOTIF_RECV
> ioctl:
> * it doesn't return when the seccomp filter becomes unused (all tasks
>   have exited).
> * EPOLLHUP is triggered not when a task exits, but rather when its zombie
>   is collected.

It seems that 2/3 also fixes another minor problem.

Suppose that a group leader installs the new filter without
SECCOMP_FILTER_FLAG_TSYNC, exits, and becomes a zombie. It can't be
released until all its sub-threads exit.

After that, without 2/3, SECCOMP_FILTER_FLAG_TSYNC from any other thread
can never succeed, seccomp_can_sync_threads() will check a zombie leader
and is_ancestor() will fail.

Right?

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited
  2024-05-23  8:59   ` Oleg Nesterov
@ 2024-05-24 17:47     ` Andrei Vagin
  0 siblings, 0 replies; 12+ messages in thread
From: Andrei Vagin @ 2024-05-24 17:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kees Cook, Andy Lutomirski, Will Drewry, Christian Brauner,
	linux-kernel, Tycho Andersen, Jens Axboe

On Thu, May 23, 2024 at 2:00 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> Hi Andrei,
>
> the patch looks good to me even if I don't really understand what
> SECCOMP_IOCTL_NOTIF_RECV does. But let me ask a stupid question,
>
> On 05/23, Andrei Vagin wrote:
> >
> > The change is based on the 'commit 99cdb8b9a573 ("seccomp: notify about
> > unused filter")' which implemented (E)POLLHUP notifications.
>
> To me this patch fixes the commit above, because without this change

It depends on how we look at it. I think the intention was to address
the epoll/poll/select syscalls to return (E)POLLHUP notifications when
filters have been orphaned. Plus, this code looked a bit different that
time and recv_wake_function used another notification mechanism.

>
> > @@ -1466,7 +1466,7 @@ static int recv_wake_function(wait_queue_entry_t *wait, unsigned int mode, int s
> >                                 void *key)
> >  {
> >       /* Avoid a wakeup if event not interesting for us. */
> > -     if (key && !(key_to_poll(key) & (EPOLLIN | EPOLLERR)))
> > +     if (key && !(key_to_poll(key) & (EPOLLIN | EPOLLERR | EPOLLHUP)))
>
> __seccomp_filter_orphan() -> wake_up_poll(&orig->wqh, EPOLLHUP) won't
> wakeup the task sleeping in recv_wait_event(), right ?
>
> In any case, FWIW
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV
  2024-05-23  1:45 [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Andrei Vagin
                   ` (3 preceding siblings ...)
  2024-05-23  9:33 ` [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Oleg Nesterov
@ 2024-06-25  0:17 ` Andrei Vagin
  2024-06-26 19:00   ` Kees Cook
  4 siblings, 1 reply; 12+ messages in thread
From: Andrei Vagin @ 2024-06-25  0:17 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski, Will Drewry, Oleg Nesterov,
	Christian Brauner
  Cc: linux-kernel, Tycho Andersen, Jens Axboe

Kees,

Are you waiting for anything from me? I think this series is ready to be merged.

Thanks,
Andrei

On Wed, May 22, 2024 at 6:45 PM Andrei Vagin <avagin@google.com> wrote:
>
> This patch set addresses two problems with the SECCOMP_IOCTL_NOTIF_RECV
> ioctl:
> * it doesn't return when the seccomp filter becomes unused (all tasks
>   have exited).
> * EPOLLHUP is triggered not when a task exits, but rather when its zombie
>   is collected.
>
> v2: - Remove unnecessary checks of PF_EXITING.
>     - Take siglock with disabling irqs.
>     Thanks to Oleg for the review and the help with the first version.
>
> Andrei Vagin (3):
>   seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited
>   seccomp: release task filters when the task exits
>   selftests/seccomp: add test for NOTIF_RECV and unused filters
>
>  kernel/exit.c                                 |  3 +-
>  kernel/seccomp.c                              | 38 ++++++++++---
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 54 +++++++++++++++++++
>  3 files changed, 88 insertions(+), 7 deletions(-)
>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Will Drewry <wad@chromium.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Tycho Andersen <tandersen@netflix.com>
>
>
> --
> 2.45.0.rc1.225.g2a3ae87e7f-goog
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV
  2024-05-23  9:33 ` [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Oleg Nesterov
@ 2024-06-25  0:19   ` Andrei Vagin
  0 siblings, 0 replies; 12+ messages in thread
From: Andrei Vagin @ 2024-06-25  0:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Kees Cook, Andy Lutomirski, Will Drewry, Christian Brauner,
	linux-kernel, Tycho Andersen, Jens Axboe

On Thu, May 23, 2024 at 2:35 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 05/23, Andrei Vagin wrote:
> >
> > This patch set addresses two problems with the SECCOMP_IOCTL_NOTIF_RECV
> > ioctl:
> > * it doesn't return when the seccomp filter becomes unused (all tasks
> >   have exited).
> > * EPOLLHUP is triggered not when a task exits, but rather when its zombie
> >   is collected.
>
> It seems that 2/3 also fixes another minor problem.
>
> Suppose that a group leader installs the new filter without
> SECCOMP_FILTER_FLAG_TSYNC, exits, and becomes a zombie. It can't be
> released until all its sub-threads exit.
>
> After that, without 2/3, SECCOMP_FILTER_FLAG_TSYNC from any other thread
> can never succeed, seccomp_can_sync_threads() will check a zombie leader
> and is_ancestor() will fail.
>
> Right?

It is right. I can introduce a self test for this case too, but let's
do that in a separate patch set.

>
> Oleg.
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] seccomp: release task filters when the task exits
  2024-05-23  1:45 ` [PATCH 2/3] seccomp: release task filters when the task exits Andrei Vagin
  2024-05-23  9:00   ` Oleg Nesterov
@ 2024-06-26 18:57   ` Kees Cook
  1 sibling, 0 replies; 12+ messages in thread
From: Kees Cook @ 2024-06-26 18:57 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Andy Lutomirski, Will Drewry, Oleg Nesterov, Christian Brauner,
	linux-kernel, Tycho Andersen, Jens Axboe

On Thu, May 23, 2024 at 01:45:39AM +0000, Andrei Vagin wrote:
> Previously, seccomp filters were released in release_task(), which
> required the process to exit and its zombie to be collected. However,
> exited threads/processes can't trigger any seccomp events, making it
> more logical to release filters upon task exits.
> 
> This adjustment simplifies scenarios where a parent is tracing its child
> process. The parent process can now handle all events from a seccomp
> listening descriptor and then call wait to collect a child zombie.
> 
> seccomp_filter_release takes the siglock to avoid races with
> seccomp_sync_threads. There was an idea to bypass taking the lock by
> checking PF_EXITING, but it can be set without holding siglock if
> threads have SIGNAL_GROUP_EXIT. This means it can happen concurently
> with seccomp_filter_release.
> 
> Signed-off-by: Andrei Vagin <avagin@google.com>
> ---
>  kernel/exit.c    |  3 ++-
>  kernel/seccomp.c | 22 ++++++++++++++++------
>  2 files changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 41a12630cbbc..23439c021d8d 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -278,7 +278,6 @@ void release_task(struct task_struct *p)
>  	}
>  
>  	write_unlock_irq(&tasklist_lock);
> -	seccomp_filter_release(p);
>  	proc_flush_pid(thread_pid);
>  	put_pid(thread_pid);
>  	release_thread(p);
> @@ -836,6 +835,8 @@ void __noreturn do_exit(long code)
>  	io_uring_files_cancel();
>  	exit_signals(tsk);  /* sets PF_EXITING */
>  
> +	seccomp_filter_release(tsk);
> +
>  	acct_update_integrals(tsk);
>  	group_dead = atomic_dec_and_test(&tsk->signal->live);
>  	if (group_dead) {
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 35435e8f1035..67305e776dd3 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -502,6 +502,9 @@ static inline pid_t seccomp_can_sync_threads(void)
>  		/* Skip current, since it is initiating the sync. */
>  		if (thread == caller)
>  			continue;
> +		/* Skip exited threads. */
> +		if (thread->flags & PF_EXITING)
> +			continue;
>  
>  		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED ||
>  		    (thread->seccomp.mode == SECCOMP_MODE_FILTER &&
> @@ -563,18 +566,18 @@ static void __seccomp_filter_release(struct seccomp_filter *orig)
>   * @tsk: task the filter should be released from.
>   *
>   * This function should only be called when the task is exiting as
> - * it detaches it from its filter tree. As such, READ_ONCE() and
> - * barriers are not needed here, as would normally be needed.
> + * it detaches it from its filter tree. PF_EXITING has to be set
> + * for the task.

Let's capture this requirement with a WARN_ON() (like was done for the
sighand case before). So before the spinlock, check for PF_EXITING and
fail safe (don't release):

	if (WARN_ON((tsk->flags & PF_EXITING) == 0))
		return;

>   */
>  void seccomp_filter_release(struct task_struct *tsk)
>  {
> -	struct seccomp_filter *orig = tsk->seccomp.filter;
> -
> -	/* We are effectively holding the siglock by not having any sighand. */
> -	WARN_ON(tsk->sighand != NULL);
> +	struct seccomp_filter *orig;
>  
> +	spin_lock_irq(&current->sighand->siglock);

Shouldn't this be "tsk" not "current"?

> +	orig = tsk->seccomp.filter;
>  	/* Detach task from its filter tree. */
>  	tsk->seccomp.filter = NULL;
> +	spin_unlock_irq(&current->sighand->siglock);

Same.

>  	__seccomp_filter_release(orig);
>  }
>  
> @@ -602,6 +605,13 @@ static inline void seccomp_sync_threads(unsigned long flags)
>  		if (thread == caller)
>  			continue;
>  
> +		/*
> +		 * Skip exited threads. seccomp_filter_release could have
> +		 * been already called for this task.
> +		 */
> +		if (thread->flags & PF_EXITING)
> +			continue;
> +
>  		/* Get a task reference for the new leaf node. */
>  		get_seccomp_filter(caller);
>  
> -- 
> 2.45.1.288.g0e0cd299f1-goog
> 

Otherwise, looks good!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV
  2024-06-25  0:17 ` Andrei Vagin
@ 2024-06-26 19:00   ` Kees Cook
  0 siblings, 0 replies; 12+ messages in thread
From: Kees Cook @ 2024-06-26 19:00 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Andy Lutomirski, Will Drewry, Oleg Nesterov, Christian Brauner,
	linux-kernel, Tycho Andersen, Jens Axboe

On Mon, Jun 24, 2024 at 05:17:09PM -0700, Andrei Vagin wrote:
> Are you waiting for anything from me? I think this series is ready to be merged.

Oops, sorry for the silence! I had been waiting for Oleg's review, and
then it happened and I missed it, so it fell off my TODO list. :)

I just sent a reply with 2 bits of feedback, but with a v3, I think I
can land this ASAP.

Thanks for chasing me down, and I appreciate the selftest updates!

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-06-26 19:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23  1:45 [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Andrei Vagin
2024-05-23  1:45 ` [PATCH 1/3] seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited Andrei Vagin
2024-05-23  8:59   ` Oleg Nesterov
2024-05-24 17:47     ` Andrei Vagin
2024-05-23  1:45 ` [PATCH 2/3] seccomp: release task filters when the task exits Andrei Vagin
2024-05-23  9:00   ` Oleg Nesterov
2024-06-26 18:57   ` Kees Cook
2024-05-23  1:45 ` [PATCH 3/3] selftests/seccomp: add test for NOTIF_RECV and unused filters Andrei Vagin
2024-05-23  9:33 ` [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Oleg Nesterov
2024-06-25  0:19   ` Andrei Vagin
2024-06-25  0:17 ` Andrei Vagin
2024-06-26 19:00   ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox