* Re: [PATCH 2/3] seccomp: release task filters when the task exits [not found] ` <CAEWA0a5dBvRwGAnztL56i=JV-WGGiaTd-GdJYdOxZmq1c+bdpg@mail.gmail.com> @ 2024-05-16 9:34 ` Oleg Nesterov 2024-05-16 13:09 ` Oleg Nesterov 0 siblings, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2024-05-16 9:34 UTC (permalink / raw) To: Andrei Vagin Cc: Kees Cook, Tycho Andersen, Andy Lutomirski, Will Drewry, Jens Axboe, Christian Brauner, linux-kernel (add lkml) On 05/15, Andrei Vagin wrote: > > On Wed, May 15, 2024 at 5:52 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > Let me repeat I forgot everything about seccomp, but let me ask > > a couple of questions... > > It seems you still remember something:). Thank you for the feedback. Just I am still remember how to use grep ;) > > > @@ -2126,6 +2137,11 @@ static struct seccomp_filter *get_nth_filter(struct task_struct *task, > > > */ > > > spin_lock_irq(&task->sighand->siglock); > > > > > > + if (task->flags & PF_EXITING) { > > > + spin_unlock_irq(&task->sighand->siglock); > > > + return ERR_PTR(-EINVAL); > > > + } > > > > Why do we need the PF_EXITING check here? > > > > This looks unnecessary even if get_nth_filter() could race with the > > exiting task, but this doesn't matter. > > > > This race is not possible, get_nth_filter() is only called from ptrace() > > paths, but the tracee can't stop in TASK_TRACED after exit_signals() which > > sets PF_EXITING. > > If we rely on using seccomp_get_filter only from ptrace, you are right. Plus it too does __get_seccomp_filter/__get_seccomp_filter, so I guess it should be safe without this check even if it could be used outside of ptrace. Just like proc_pid_seccomp_cache(), see below. > > > @@ -2494,6 +2510,11 @@ int proc_pid_seccomp_cache(struct seq_file *m, struct pid_namespace *ns, > > > if (!lock_task_sighand(task, &flags)) > > > return -ESRCH; > > > > > > + if (thread->flags & PF_EXITING) { > > > + unlock_task_sighand(task, &flags); > > > + return 0; > > > > Again, do we really need this check? > > > > It can race with the exiting task and (without this check) do > > __get_seccomp_filter(f) right before seccomp_filter_release() > > takes sighand->siglock. But why is it bad? > > I think you are right, this check isn't required. > > > > > OTOH. I guess proc_pid_seccomp_cache() is the only reason why > > seccomp_filter_release() takes ->siglock with your patch? > > seccomp_sync_threads and seccomp_can_sync_threads should be considered too. Yes. But we only need to consider them in the multi-thread case, right? In this case exit_signals() sets PF_EXITING under ->siglock, so they can't miss this flag, seccomp_filter_release() doesn't need to take siglock. > If we check PF_EXITING in all of them, we don't need to take ->siglock in > seccomp_filter_release. Does it sound right? The problem is a single-threaded exiting task. In this case exit_signals() sets PF_EXITING lockless. This means that in this case - proc_pid_seccomp_cache() can't rely on the PF_EXITING check but it can be safely removed. - seccomp_filter_release() needs to take ->siglock to avoid the race with proc_pid_seccomp_cache(). And this chunk from your patch static void __seccomp_filter_orphan(struct seccomp_filter *orig) { + lockdep_assert_held(¤t->sighand->siglock); + looks unnecessary too, seccomp_filter_release() can just do spin_lock_irq(siglock); orig = tsk->seccomp.filter; tsk->seccomp.filter = NULL; spin_unlock_irq(siglock); __seccomp_filter_release(orig); Right? Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] seccomp: release task filters when the task exits 2024-05-16 9:34 ` [PATCH 2/3] seccomp: release task filters when the task exits Oleg Nesterov @ 2024-05-16 13:09 ` Oleg Nesterov 2024-05-22 6:49 ` Andrei Vagin 0 siblings, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2024-05-16 13:09 UTC (permalink / raw) To: Andrei Vagin Cc: Kees Cook, Tycho Andersen, Andy Lutomirski, Will Drewry, Jens Axboe, Christian Brauner, linux-kernel On 05/16, Oleg Nesterov wrote: > > On 05/15, Andrei Vagin wrote: > > > > seccomp_sync_threads and seccomp_can_sync_threads should be considered too. > > Yes. But we only need to consider them in the multi-thread case, right? > In this case exit_signals() sets PF_EXITING under ->siglock, so they can't > miss this flag, seccomp_filter_release() doesn't need to take siglock. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Ah, no. seccomp_filter_release() does need to take ->siglock even if we forget about proc_pid_seccomp_cache(). Without siglock orig = tsk->seccomp.filter; can leak into the critical section in exit_signals() (spin_unlock is the one-way barrier) and this LOAD can be reordered with "flags |= PF_EXITING". Hmm. I thought we have something smp_mb__after_unlock(), but it seems we don't. So we can't add a fast-path if (!tsk->seccomp.filter) return; check at the start of seccomp_filter_release(). Cough... Now that I look at seccomp_can_sync_threads() I think it too doesn't need the PF_EXITING check. If it is called before seccomp_filter_release(), this doesn't really differ from the case when it is called before do_exit/exit_signals. If it is called after seccomp_filter_release(), then is_ancestor() must be true. But perhaps I missed something, I won't insist, up to you. > > If we check PF_EXITING in all of them, we don't need to take ->siglock in > > seccomp_filter_release. Does it sound right? > > The problem is a single-threaded exiting task. In this case exit_signals() > sets PF_EXITING lockless. This means that in this case > > - proc_pid_seccomp_cache() can't rely on the PF_EXITING check > but it can be safely removed. > > - seccomp_filter_release() needs to take ->siglock to avoid the > race with proc_pid_seccomp_cache(). > > And this chunk from your patch > > static void __seccomp_filter_orphan(struct seccomp_filter *orig) > { > + lockdep_assert_held(¤t->sighand->siglock); > + > > looks unnecessary too, seccomp_filter_release() can just do > > spin_lock_irq(siglock); > orig = tsk->seccomp.filter; > tsk->seccomp.filter = NULL; > spin_unlock_irq(siglock); > > __seccomp_filter_release(orig); > > Right? > > Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] seccomp: release task filters when the task exits 2024-05-16 13:09 ` Oleg Nesterov @ 2024-05-22 6:49 ` Andrei Vagin 2024-05-22 7:06 ` Andrei Vagin 0 siblings, 1 reply; 8+ messages in thread From: Andrei Vagin @ 2024-05-22 6:49 UTC (permalink / raw) To: Oleg Nesterov Cc: Kees Cook, Tycho Andersen, Andy Lutomirski, Will Drewry, Jens Axboe, Christian Brauner, linux-kernel On Thu, May 16, 2024 at 6:10 AM Oleg Nesterov <oleg@redhat.com> wrote: > > On 05/16, Oleg Nesterov wrote: > > > > On 05/15, Andrei Vagin wrote: > > > > > > seccomp_sync_threads and seccomp_can_sync_threads should be considered too. > > > > Yes. But we only need to consider them in the multi-thread case, right? > > In this case exit_signals() sets PF_EXITING under ->siglock, so they can't > > miss this flag, seccomp_filter_release() doesn't need to take siglock. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Ah, no. seccomp_filter_release() does need to take ->siglock even if we > forget about proc_pid_seccomp_cache(). > > Without siglock > > orig = tsk->seccomp.filter; > > can leak into the critical section in exit_signals() (spin_unlock is the > one-way barrier) and this LOAD can be reordered with "flags |= PF_EXITING". > > Hmm. I thought we have something smp_mb__after_unlock(), but it seems we > don't. So we can't add a fast-path We have smp_mb__after_unlock_lock in include/linux/rcupdate.h. > > if (!tsk->seccomp.filter) > return; > > check at the start of seccomp_filter_release(). > > > Cough... Now that I look at seccomp_can_sync_threads() I think it too > doesn't need the PF_EXITING check. > > If it is called before seccomp_filter_release(), this doesn't really > differ from the case when it is called before do_exit/exit_signals. > > If it is called after seccomp_filter_release(), then is_ancestor() > must be true. > > But perhaps I missed something, I won't insist, up to you. > > > > If we check PF_EXITING in all of them, we don't need to take ->siglock in > > > seccomp_filter_release. Does it sound right? > > > > The problem is a single-threaded exiting task. In this case exit_signals() > > sets PF_EXITING lockless. This means that in this case > > > > - proc_pid_seccomp_cache() can't rely on the PF_EXITING check > > but it can be safely removed. > > > > - seccomp_filter_release() needs to take ->siglock to avoid the > > race with proc_pid_seccomp_cache(). > > > > And this chunk from your patch > > > > static void __seccomp_filter_orphan(struct seccomp_filter *orig) > > { > > + lockdep_assert_held(¤t->sighand->siglock); > > + > > > > looks unnecessary too, seccomp_filter_release() can just do > > > > spin_lock_irq(siglock); > > orig = tsk->seccomp.filter; > > tsk->seccomp.filter = NULL; > > spin_unlock_irq(siglock); > > > > __seccomp_filter_release(orig); > > > > Right? > > > > Oleg. > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] seccomp: release task filters when the task exits 2024-05-22 6:49 ` Andrei Vagin @ 2024-05-22 7:06 ` Andrei Vagin 2024-05-22 10:35 ` Oleg Nesterov 0 siblings, 1 reply; 8+ messages in thread From: Andrei Vagin @ 2024-05-22 7:06 UTC (permalink / raw) To: Oleg Nesterov Cc: Kees Cook, Tycho Andersen, Andy Lutomirski, Will Drewry, Jens Axboe, Christian Brauner, linux-kernel > On Thu, May 16, 2024 at 6:10 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > On 05/16, Oleg Nesterov wrote: > > > > > > On 05/15, Andrei Vagin wrote: > > > > > > > > seccomp_sync_threads and seccomp_can_sync_threads should be considered too. > > > > > > Yes. But we only need to consider them in the multi-thread case, right? > > > In this case exit_signals() sets PF_EXITING under ->siglock, so they can't > > > miss this flag, seccomp_filter_release() doesn't need to take siglock. > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PF_EXITING is set without holding ->siglock if tsk->signal has the SIGNAL_GROUP_EXIT flag. I think it can be a case when one thread is in seccomp_sync_threads and others are exiting. The first thread can check that PF_EXITING isn't set for another thread. Then, the second thread calls exit_signals and seccomp_filter_release(), and finally, the first thread sets its seccomp.filter to the second thread. If seccomp_filter_release takes siglock, it will be handled properly. > > > > Ah, no. seccomp_filter_release() does need to take ->siglock even if we > > forget about proc_pid_seccomp_cache(). > > > > Without siglock > > > > orig = tsk->seccomp.filter; > > > > can leak into the critical section in exit_signals() (spin_unlock is the > > one-way barrier) and this LOAD can be reordered with "flags |= PF_EXITING". > > > > Hmm. I thought we have something smp_mb__after_unlock(), but it seems we > > don't. So we can't add a fast-path We have smp_mb__after_unlock_lock in include/linux/rcupdate.h. > > > > if (!tsk->seccomp.filter) > > return; > > > > check at the start of seccomp_filter_release(). > > > > > > Cough... Now that I look at seccomp_can_sync_threads() I think it too > > doesn't need the PF_EXITING check. > > > > If it is called before seccomp_filter_release(), this doesn't really > > differ from the case when it is called before do_exit/exit_signals. > > > > If it is called after seccomp_filter_release(), then is_ancestor() > > must be true. > > > > But perhaps I missed something, I won't insist, up to you. > > You are right, this check isn't required in seccomp_can_sync_threads, but I decided that it is better to be consistent with seccomp_sync_threads. Thanks, Andrei ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] seccomp: release task filters when the task exits 2024-05-22 7:06 ` Andrei Vagin @ 2024-05-22 10:35 ` Oleg Nesterov 0 siblings, 0 replies; 8+ messages in thread From: Oleg Nesterov @ 2024-05-22 10:35 UTC (permalink / raw) To: Andrei Vagin Cc: Kees Cook, Tycho Andersen, Andy Lutomirski, Will Drewry, Jens Axboe, Christian Brauner, linux-kernel On 05/22, Andrei Vagin wrote: > > > On Thu, May 16, 2024 at 6:10 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > On 05/16, Oleg Nesterov wrote: > > > > > > > > On 05/15, Andrei Vagin wrote: > > > > > > > > > > seccomp_sync_threads and seccomp_can_sync_threads should be considered too. > > > > > > > > Yes. But we only need to consider them in the multi-thread case, right? > > > > In this case exit_signals() sets PF_EXITING under ->siglock, so they can't > > > > miss this flag, seccomp_filter_release() doesn't need to take siglock. > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > PF_EXITING is set without holding ->siglock if tsk->signal has the > SIGNAL_GROUP_EXIT flag. I think it can be a case when one thread is in > seccomp_sync_threads and others are exiting. Yes, I forgot this. > > > Hmm. I thought we have something smp_mb__after_unlock(), but it seems we > > > don't. So we can't add a fast-path > > We have smp_mb__after_unlock_lock in include/linux/rcupdate.h. This is another thing. But sorry for confusion, this doesn't really matter, we could you a plain mb(). I mean, I was thinking about something like seccomp_filter_release: smp_mb(); if (!READ_ONCE(tsk->seccomp.filter)) return; spin_lock_irq(siglock); orig = tsk->seccomp.filter; ... but then seccomp_sync_threads() should do something like orig = READ_ONCE(thread->seccomp.filter); smp_store_release(&thread->seccomp.filter, caller->seccomp.filter); smp_mb(); // pairs with mb() in seccomp_filter_release() if (READ_ONCE(thread->flags) & PF_EXITING) { WRITE_ONCE(thread->seccomp.filter, orig); continue; } __seccomp_filter_release(orig); ... too subtle even _if_ correct, and I am not sure at all this would be correct. > > > Cough... Now that I look at seccomp_can_sync_threads() I think it too > > > doesn't need the PF_EXITING check. > > > > > > If it is called before seccomp_filter_release(), this doesn't really > > > differ from the case when it is called before do_exit/exit_signals. > > > > > > If it is called after seccomp_filter_release(), then is_ancestor() > > > must be true. > > > > > > But perhaps I missed something, I won't insist, up to you. > > > > > You are right, this check isn't required in seccomp_can_sync_threads, but > I decided that it is better to be consistent with seccomp_sync_threads. OK, agreed. Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV
@ 2024-05-23 1:45 Andrei Vagin
2024-05-23 1:45 ` [PATCH 2/3] seccomp: release task filters when the task exits Andrei Vagin
0 siblings, 1 reply; 8+ messages in thread
From: Andrei Vagin @ 2024-05-23 1:45 UTC (permalink / raw)
To: Kees Cook, Andy Lutomirski, Will Drewry, Oleg Nesterov,
Christian Brauner
Cc: linux-kernel, Tycho Andersen, Andrei Vagin, Jens Axboe
This patch set addresses two problems with the SECCOMP_IOCTL_NOTIF_RECV
ioctl:
* it doesn't return when the seccomp filter becomes unused (all tasks
have exited).
* EPOLLHUP is triggered not when a task exits, but rather when its zombie
is collected.
v2: - Remove unnecessary checks of PF_EXITING.
- Take siglock with disabling irqs.
Thanks to Oleg for the review and the help with the first version.
Andrei Vagin (3):
seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited
seccomp: release task filters when the task exits
selftests/seccomp: add test for NOTIF_RECV and unused filters
kernel/exit.c | 3 +-
kernel/seccomp.c | 38 ++++++++++---
tools/testing/selftests/seccomp/seccomp_bpf.c | 54 +++++++++++++++++++
3 files changed, 88 insertions(+), 7 deletions(-)
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tycho Andersen <tandersen@netflix.com>
--
2.45.0.rc1.225.g2a3ae87e7f-goog
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 2/3] seccomp: release task filters when the task exits 2024-05-23 1:45 [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Andrei Vagin @ 2024-05-23 1:45 ` Andrei Vagin 2024-05-23 9:00 ` Oleg Nesterov 2024-06-26 18:57 ` Kees Cook 0 siblings, 2 replies; 8+ messages in thread From: Andrei Vagin @ 2024-05-23 1:45 UTC (permalink / raw) To: Kees Cook, Andy Lutomirski, Will Drewry, Oleg Nesterov, Christian Brauner Cc: linux-kernel, Tycho Andersen, Andrei Vagin, Jens Axboe Previously, seccomp filters were released in release_task(), which required the process to exit and its zombie to be collected. However, exited threads/processes can't trigger any seccomp events, making it more logical to release filters upon task exits. This adjustment simplifies scenarios where a parent is tracing its child process. The parent process can now handle all events from a seccomp listening descriptor and then call wait to collect a child zombie. seccomp_filter_release takes the siglock to avoid races with seccomp_sync_threads. There was an idea to bypass taking the lock by checking PF_EXITING, but it can be set without holding siglock if threads have SIGNAL_GROUP_EXIT. This means it can happen concurently with seccomp_filter_release. Signed-off-by: Andrei Vagin <avagin@google.com> --- kernel/exit.c | 3 ++- kernel/seccomp.c | 22 ++++++++++++++++------ 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 41a12630cbbc..23439c021d8d 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -278,7 +278,6 @@ void release_task(struct task_struct *p) } write_unlock_irq(&tasklist_lock); - seccomp_filter_release(p); proc_flush_pid(thread_pid); put_pid(thread_pid); release_thread(p); @@ -836,6 +835,8 @@ void __noreturn do_exit(long code) io_uring_files_cancel(); exit_signals(tsk); /* sets PF_EXITING */ + seccomp_filter_release(tsk); + acct_update_integrals(tsk); group_dead = atomic_dec_and_test(&tsk->signal->live); if (group_dead) { diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 35435e8f1035..67305e776dd3 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -502,6 +502,9 @@ static inline pid_t seccomp_can_sync_threads(void) /* Skip current, since it is initiating the sync. */ if (thread == caller) continue; + /* Skip exited threads. */ + if (thread->flags & PF_EXITING) + continue; if (thread->seccomp.mode == SECCOMP_MODE_DISABLED || (thread->seccomp.mode == SECCOMP_MODE_FILTER && @@ -563,18 +566,18 @@ static void __seccomp_filter_release(struct seccomp_filter *orig) * @tsk: task the filter should be released from. * * This function should only be called when the task is exiting as - * it detaches it from its filter tree. As such, READ_ONCE() and - * barriers are not needed here, as would normally be needed. + * it detaches it from its filter tree. PF_EXITING has to be set + * for the task. */ void seccomp_filter_release(struct task_struct *tsk) { - struct seccomp_filter *orig = tsk->seccomp.filter; - - /* We are effectively holding the siglock by not having any sighand. */ - WARN_ON(tsk->sighand != NULL); + struct seccomp_filter *orig; + spin_lock_irq(¤t->sighand->siglock); + orig = tsk->seccomp.filter; /* Detach task from its filter tree. */ tsk->seccomp.filter = NULL; + spin_unlock_irq(¤t->sighand->siglock); __seccomp_filter_release(orig); } @@ -602,6 +605,13 @@ static inline void seccomp_sync_threads(unsigned long flags) if (thread == caller) continue; + /* + * Skip exited threads. seccomp_filter_release could have + * been already called for this task. + */ + if (thread->flags & PF_EXITING) + continue; + /* Get a task reference for the new leaf node. */ get_seccomp_filter(caller); -- 2.45.1.288.g0e0cd299f1-goog ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] seccomp: release task filters when the task exits 2024-05-23 1:45 ` [PATCH 2/3] seccomp: release task filters when the task exits Andrei Vagin @ 2024-05-23 9:00 ` Oleg Nesterov 2024-06-26 18:57 ` Kees Cook 1 sibling, 0 replies; 8+ messages in thread From: Oleg Nesterov @ 2024-05-23 9:00 UTC (permalink / raw) To: Andrei Vagin Cc: Kees Cook, Andy Lutomirski, Will Drewry, Christian Brauner, linux-kernel, Tycho Andersen, Jens Axboe On 05/23, Andrei Vagin wrote: > > Previously, seccomp filters were released in release_task(), which > required the process to exit and its zombie to be collected. However, > exited threads/processes can't trigger any seccomp events, making it > more logical to release filters upon task exits. > > This adjustment simplifies scenarios where a parent is tracing its child > process. The parent process can now handle all events from a seccomp > listening descriptor and then call wait to collect a child zombie. > > seccomp_filter_release takes the siglock to avoid races with > seccomp_sync_threads. There was an idea to bypass taking the lock by > checking PF_EXITING, but it can be set without holding siglock if > threads have SIGNAL_GROUP_EXIT. This means it can happen concurently > with seccomp_filter_release. > > Signed-off-by: Andrei Vagin <avagin@google.com> > --- > kernel/exit.c | 3 ++- > kernel/seccomp.c | 22 ++++++++++++++++------ > 2 files changed, 18 insertions(+), 7 deletions(-) Reviewed-by: Oleg Nesterov <oleg@redhat.com> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] seccomp: release task filters when the task exits 2024-05-23 1:45 ` [PATCH 2/3] seccomp: release task filters when the task exits Andrei Vagin 2024-05-23 9:00 ` Oleg Nesterov @ 2024-06-26 18:57 ` Kees Cook 1 sibling, 0 replies; 8+ messages in thread From: Kees Cook @ 2024-06-26 18:57 UTC (permalink / raw) To: Andrei Vagin Cc: Andy Lutomirski, Will Drewry, Oleg Nesterov, Christian Brauner, linux-kernel, Tycho Andersen, Jens Axboe On Thu, May 23, 2024 at 01:45:39AM +0000, Andrei Vagin wrote: > Previously, seccomp filters were released in release_task(), which > required the process to exit and its zombie to be collected. However, > exited threads/processes can't trigger any seccomp events, making it > more logical to release filters upon task exits. > > This adjustment simplifies scenarios where a parent is tracing its child > process. The parent process can now handle all events from a seccomp > listening descriptor and then call wait to collect a child zombie. > > seccomp_filter_release takes the siglock to avoid races with > seccomp_sync_threads. There was an idea to bypass taking the lock by > checking PF_EXITING, but it can be set without holding siglock if > threads have SIGNAL_GROUP_EXIT. This means it can happen concurently > with seccomp_filter_release. > > Signed-off-by: Andrei Vagin <avagin@google.com> > --- > kernel/exit.c | 3 ++- > kernel/seccomp.c | 22 ++++++++++++++++------ > 2 files changed, 18 insertions(+), 7 deletions(-) > > diff --git a/kernel/exit.c b/kernel/exit.c > index 41a12630cbbc..23439c021d8d 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -278,7 +278,6 @@ void release_task(struct task_struct *p) > } > > write_unlock_irq(&tasklist_lock); > - seccomp_filter_release(p); > proc_flush_pid(thread_pid); > put_pid(thread_pid); > release_thread(p); > @@ -836,6 +835,8 @@ void __noreturn do_exit(long code) > io_uring_files_cancel(); > exit_signals(tsk); /* sets PF_EXITING */ > > + seccomp_filter_release(tsk); > + > acct_update_integrals(tsk); > group_dead = atomic_dec_and_test(&tsk->signal->live); > if (group_dead) { > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index 35435e8f1035..67305e776dd3 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -502,6 +502,9 @@ static inline pid_t seccomp_can_sync_threads(void) > /* Skip current, since it is initiating the sync. */ > if (thread == caller) > continue; > + /* Skip exited threads. */ > + if (thread->flags & PF_EXITING) > + continue; > > if (thread->seccomp.mode == SECCOMP_MODE_DISABLED || > (thread->seccomp.mode == SECCOMP_MODE_FILTER && > @@ -563,18 +566,18 @@ static void __seccomp_filter_release(struct seccomp_filter *orig) > * @tsk: task the filter should be released from. > * > * This function should only be called when the task is exiting as > - * it detaches it from its filter tree. As such, READ_ONCE() and > - * barriers are not needed here, as would normally be needed. > + * it detaches it from its filter tree. PF_EXITING has to be set > + * for the task. Let's capture this requirement with a WARN_ON() (like was done for the sighand case before). So before the spinlock, check for PF_EXITING and fail safe (don't release): if (WARN_ON((tsk->flags & PF_EXITING) == 0)) return; > */ > void seccomp_filter_release(struct task_struct *tsk) > { > - struct seccomp_filter *orig = tsk->seccomp.filter; > - > - /* We are effectively holding the siglock by not having any sighand. */ > - WARN_ON(tsk->sighand != NULL); > + struct seccomp_filter *orig; > > + spin_lock_irq(¤t->sighand->siglock); Shouldn't this be "tsk" not "current"? > + orig = tsk->seccomp.filter; > /* Detach task from its filter tree. */ > tsk->seccomp.filter = NULL; > + spin_unlock_irq(¤t->sighand->siglock); Same. > __seccomp_filter_release(orig); > } > > @@ -602,6 +605,13 @@ static inline void seccomp_sync_threads(unsigned long flags) > if (thread == caller) > continue; > > + /* > + * Skip exited threads. seccomp_filter_release could have > + * been already called for this task. > + */ > + if (thread->flags & PF_EXITING) > + continue; > + > /* Get a task reference for the new leaf node. */ > get_seccomp_filter(caller); > > -- > 2.45.1.288.g0e0cd299f1-goog > Otherwise, looks good! -- Kees Cook ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-06-26 18:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240514175551.297237-1-avagin@google.com>
[not found] ` <20240514175551.297237-3-avagin@google.com>
[not found] ` <20240515125113.GC6821@redhat.com>
[not found] ` <CAEWA0a5dBvRwGAnztL56i=JV-WGGiaTd-GdJYdOxZmq1c+bdpg@mail.gmail.com>
2024-05-16 9:34 ` [PATCH 2/3] seccomp: release task filters when the task exits Oleg Nesterov
2024-05-16 13:09 ` Oleg Nesterov
2024-05-22 6:49 ` Andrei Vagin
2024-05-22 7:06 ` Andrei Vagin
2024-05-22 10:35 ` Oleg Nesterov
2024-05-23 1:45 [PATCH 0/3 v2] seccomp: improve handling of SECCOMP_IOCTL_NOTIF_RECV Andrei Vagin
2024-05-23 1:45 ` [PATCH 2/3] seccomp: release task filters when the task exits Andrei Vagin
2024-05-23 9:00 ` Oleg Nesterov
2024-06-26 18:57 ` Kees Cook
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox