* [PATCH v2 1/3] signal: change force_sig_info_to_task() to call __send_signal_locked()
@ 2026-06-19 13:27 Oleg Nesterov
2026-06-19 13:27 ` [PATCH v2 2/3] signal: turn the "bool force" arg of __send_signal_locked() into "int flags" Oleg Nesterov
` (2 more replies)
0 siblings, 3 replies; 36+ messages in thread
From: Oleg Nesterov @ 2026-06-19 13:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Andy Lutomirski, Eric W. Biederman, Kees Cook, Kusaram Devineni,
Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel
force_sig_info_to_task() calls send_signal_locked() which does two
things on top of __send_signal_locked():
1. The namespace translation of si_pid/si_uid. However, forced signals
carry fault info (si_addr, si_call_addr, si_syscall), not pid/uid.
The force_sig*() API should never be used to send signals with
meaningful si_pid/si_uid, the forced signals are always "from kernel".
There are few users of force_sig(SIGKILL), and in this case
send_signal_locked() -> has_si_pid_and_uid() returns true.
However, __send_signal_locked() simply ignores kernel_siginfo if
sig == SIGKILL.
(and in fact force_sig(SIGKILL) makes little sense, they should
use send_sig(SIGKILL, p, 1) instead)
2. The "force" computation. However, for the forced signals, the
unconditional force == true works just fine.
If the target is ptraced, the "force" arg has no effect unless
sig == SIGKILL.
Otherwise, this check in sig_task_ignored()
if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
handler == SIG_DFL && !(force && sig_kernel_only(sig)))
return true;
has no effect, force_sig_info_to_task() clears SIGNAL_UNKILLABLE
if handler == SIG_DFL.
The only behavioral difference is another check in sig_task_ignored:
if (unlikely((t->flags & PF_KTHREAD) &&
(handler == SIG_KTHREAD_KERNEL) && !force))
So with this patch a kthread that called allow_kernel_signal()
for a fault signal would now receive the forced signal instead
of silently ignoring it.
And this is arguably more correct, even if I don't think that
the force_sig*() API should be used in this case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index 9c2b32c4d755..68af503ed43c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1315,7 +1315,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
if (action->sa.sa_handler == SIG_DFL &&
(!t->ptrace || (handler == HANDLER_EXIT)))
t->signal->flags &= ~SIGNAL_UNKILLABLE;
- ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
+ ret = __send_signal_locked(sig, info, t, PIDTYPE_PID, true);
/* This can happen if the signal was already pending and blocked */
if (!task_sigpending(t))
signal_wake_up(t, 0);
--
2.52.0
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH v2 2/3] signal: turn the "bool force" arg of __send_signal_locked() into "int flags" 2026-06-19 13:27 [PATCH v2 1/3] signal: change force_sig_info_to_task() to call __send_signal_locked() Oleg Nesterov @ 2026-06-19 13:27 ` Oleg Nesterov 2026-06-19 13:28 ` [PATCH v2 3/3] signal: fix evasion of SA_IMMUTABLE signals Oleg Nesterov 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman 2 siblings, 0 replies; 36+ messages in thread From: Oleg Nesterov @ 2026-06-19 13:27 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Eric W. Biederman, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel No functional change. Preparation for the next patch which will add another flag to fix the SA_IMMUTABLE signal evasion. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- kernel/signal.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 68af503ed43c..9c607a598ba1 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1037,8 +1037,10 @@ static inline bool legacy_queue(struct sigpending *signals, int sig) return (sig < SIGRTMIN) && sigismember(&signals->signal, sig); } +#define SEND_SIGNAL_FORCE (1 << 0) + static int __send_signal_locked(int sig, struct kernel_siginfo *info, - struct task_struct *t, enum pid_type type, bool force) + struct task_struct *t, enum pid_type type, int flags) { struct sigpending *pending; struct sigqueue *q; @@ -1048,7 +1050,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info, lockdep_assert_held(&t->sighand->siglock); result = TRACE_SIGNAL_IGNORED; - if (!prepare_signal(sig, t, force)) + if (!prepare_signal(sig, t, flags & SEND_SIGNAL_FORCE)) goto ret; pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; @@ -1211,7 +1213,8 @@ int send_signal_locked(int sig, struct kernel_siginfo *info, force = true; } } - return __send_signal_locked(sig, info, t, type, force); + return __send_signal_locked(sig, info, t, type, + force ? SEND_SIGNAL_FORCE : 0); } static void print_fatal_signal(int signr) @@ -1295,6 +1298,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, unsigned long int flags; int ret, blocked, ignored; struct k_sigaction *action; + int send_flags = SEND_SIGNAL_FORCE; int sig = info->si_signo; spin_lock_irqsave(&t->sighand->siglock, flags); @@ -1315,7 +1319,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, if (action->sa.sa_handler == SIG_DFL && (!t->ptrace || (handler == HANDLER_EXIT))) t->signal->flags &= ~SIGNAL_UNKILLABLE; - ret = __send_signal_locked(sig, info, t, PIDTYPE_PID, true); + ret = __send_signal_locked(sig, info, t, PIDTYPE_PID, send_flags); /* This can happen if the signal was already pending and blocked */ if (!task_sigpending(t)) signal_wake_up(t, 0); @@ -1550,7 +1554,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, if (sig) { if (lock_task_sighand(p, &flags)) { - ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false); + ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, 0); unlock_task_sighand(p, &flags); } else ret = -ESRCH; @@ -2259,7 +2263,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig) * parent's namespaces. */ if (sig) - __send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false); + __send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, 0); __wake_up_parent(tsk, tsk->parent); spin_unlock_irqrestore(&psig->siglock, flags); -- 2.52.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 3/3] signal: fix evasion of SA_IMMUTABLE signals 2026-06-19 13:27 [PATCH v2 1/3] signal: change force_sig_info_to_task() to call __send_signal_locked() Oleg Nesterov 2026-06-19 13:27 ` [PATCH v2 2/3] signal: turn the "bool force" arg of __send_signal_locked() into "int flags" Oleg Nesterov @ 2026-06-19 13:28 ` Oleg Nesterov 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman 2 siblings, 0 replies; 36+ messages in thread From: Oleg Nesterov @ 2026-06-19 13:28 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Eric W. Biederman, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel force_sig_info_to_task(HANDLER_EXIT) sets SA_IMMUTABLE to ensure a forced fatal signal cannot be ignored or caught by userspace; it must always terminate the target. However, if get_signal() dequeues another synchronous signal first, and that signal has a handler and its sa_mask includes the fatal SA_IMMUTABLE signal, the task can return to userspace and survive. So dequeue_synchronous_signal() must always dequeue an SA_IMMUTABLE signal first. But it relies on the SI_FROMKERNEL() check and picks the first one it sees in pending->list, and thus we have the following problems: - If the same signal was already pending and blocked, the new siginfo with .si_code > 0 will be lost. Change __send_signal_locked() to bypass the legacy_queue() check in this case. - If force_sig_info_to_task() races with another synchronous/SI_FROMKERNEL signal, that signal can be picked first. Change __send_signal_locked() to add an SA_IMMUTABLE signal at the start of pending->list. - SA_IMMUTABLE implies override_rlimit == true, but GFP_ATOMIC can fail anyway. Change __send_signal_locked() to escalate to SIGKILL in this (very unlikely) case. Not perfect and perhaps deserves WARN() or pr_warn_ratelimited(), but better than nothing. However, unlike get_signal(), __send_signal_locked() can not rely on the k_sigaction.sa.sa_flags & SA_IMMUTABLE check; another signal with the same .si_signo can come before dequeue_synchronous_signal() dequeues the signal sent by force(HANDLER_EXIT). Say, send_sig_perf() from task_work_run(), and this signal is SI_FROMKERNEL() too. Use the new SEND_SIGNAL_IMMUTABLE flag to pass the "immutable" state from force_sig_info_to_task() to __send_signal_locked(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- kernel/signal.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 9c607a598ba1..077effd21582 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1038,10 +1038,12 @@ static inline bool legacy_queue(struct sigpending *signals, int sig) } #define SEND_SIGNAL_FORCE (1 << 0) +#define SEND_SIGNAL_IMMUTABLE (1 << 1) static int __send_signal_locked(int sig, struct kernel_siginfo *info, struct task_struct *t, enum pid_type type, int flags) { + bool immutable = flags & SEND_SIGNAL_IMMUTABLE; struct sigpending *pending; struct sigqueue *q; int override_rlimit; @@ -1055,12 +1057,12 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info, pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; /* - * Short-circuit ignored signals and support queuing - * exactly one non-rt signal, so that we can get more - * detailed information about the cause of the signal. + * Queue exactly one non-rt signal so that we can get more + * detailed information about the cause. But we must never + * lose the siginfo for an SA_IMMUTABLE signal. */ result = TRACE_SIGNAL_ALREADY_PENDING; - if (legacy_queue(pending, sig)) + if (legacy_queue(pending, sig) && !immutable) goto ret; result = TRACE_SIGNAL_DELIVERED; @@ -1087,7 +1089,12 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info, q = sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit); if (q) { - list_add_tail(&q->list, &pending->list); + /* Ensure dequeue_synchronous_signal() sees SA_IMMUTABLE first */ + if (immutable) + list_add(&q->list, &pending->list); + else + list_add_tail(&q->list, &pending->list); + switch ((unsigned long) info) { case (unsigned long) SEND_SIG_NOINFO: clear_siginfo(&q->info); @@ -1130,6 +1137,9 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info, * send the signal, but the *info bits are lost. */ result = TRACE_SIGNAL_LOSE_INFO; + /* The task must not escape SA_IMMUTABLE; escalate to SIGKILL */ + if (immutable) + sig = SIGKILL; } out_set: @@ -1307,8 +1317,10 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, blocked = sigismember(&t->blocked, sig); if (blocked || ignored || (handler != HANDLER_CURRENT)) { action->sa.sa_handler = SIG_DFL; - if (handler == HANDLER_EXIT) + if (handler == HANDLER_EXIT) { action->sa.sa_flags |= SA_IMMUTABLE; + send_flags |= SEND_SIGNAL_IMMUTABLE; + } if (blocked) sigdelset(&t->blocked, sig); } -- 2.52.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 0/11] Short circuit delivery for coredump signals 2026-06-19 13:27 [PATCH v2 1/3] signal: change force_sig_info_to_task() to call __send_signal_locked() Oleg Nesterov 2026-06-19 13:27 ` [PATCH v2 2/3] signal: turn the "bool force" arg of __send_signal_locked() into "int flags" Oleg Nesterov 2026-06-19 13:28 ` [PATCH v2 3/3] signal: fix evasion of SA_IMMUTABLE signals Oleg Nesterov @ 2026-06-26 16:52 ` Eric W. Biederman 2026-06-26 16:54 ` [PATCH 01/11] signal: Compute the exit_code in get_signal Eric W. Biederman ` (11 more replies) 2 siblings, 12 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:52 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov Oleg's recent patchset tweaking how force_sig_info works has inspired me to finally push through and update the signal handling to have proper short circuit deliver for coredump signals. Everything is just simpler when coredumps are not such a large special case. What makes this tricky is coredumps have had their own process shoot-down logic similar to but separate and different from everything else in the kernel. The bulk of this set of changes is merging the process shoot-down logic that is used for signals and the logic for coredumps. So the same process shoot-down logic can be shared. With the shoot-down logic sorted the rest is quite straight forward. Who should pick up these changes? Historically I would put it in my own tree but unfortunately I just have a little bit of time here and there, and I can't predict when I will have time to work on things. Eric W. Biederman (11): signal: Compute the exit_code in get_signal signal: In get_signal call do_exit when it is unnecessary to shoot down threads signal: Bring down all threads when handling a non-coredump fatal signal signal: Move stopping for the coredump from do_exit into get_signal signal: Move audit_core_dumps from do_coredump into get_signal coredump: In zap_threads complete startup if there is no need to wait signal: Use the thread killing in get_signal for coredumps exit: Make do_group_exit static signal: Dequeue fatal signals signal: Short circuit deliver coredump signals signal: Remove SA_IMMUTABLE fs/coredump.c | 161 +++++++++++++++++---------------- include/linux/coredump.h | 4 + include/linux/sched/signal.h | 2 + include/linux/sched/task.h | 1 - include/linux/signal_types.h | 3 - include/uapi/asm-generic/signal-defs.h | 1 - kernel/exit.c | 41 ++------- kernel/signal.c | 119 +++++++++++++++--------- mm/oom_kill.c | 2 +- 9 files changed, 171 insertions(+), 163 deletions(-) ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 01/11] signal: Compute the exit_code in get_signal 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman @ 2026-06-26 16:54 ` Eric W. Biederman 2026-06-26 16:54 ` [PATCH 02/11] signal: In get_signal call do_exit when it is unnecessary to shoot down threads Eric W. Biederman ` (10 subsequent siblings) 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:54 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov Update get_signal so it calls do_group_exit with the correct exit_code. Make the default exit_code 0, so that the special case for threads killed by de_thread falls out naturally. Update do_group_exit to trust the exit_code passed in except when SIGNAL_GROUP_EXIT is set. Moving the computation of exit_code into get_signal makes other cleanups possible. --- kernel/exit.c | 4 ++-- kernel/signal.c | 12 ++++++++---- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index f50d73c272d6..ae143be7c831 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1098,7 +1098,7 @@ do_group_exit(int exit_code) if (sig->flags & SIGNAL_GROUP_EXIT) exit_code = sig->group_exit_code; else if (sig->group_exec_task) - exit_code = 0; + ; else { struct sighand_struct *const sighand = current->sighand; @@ -1107,7 +1107,7 @@ do_group_exit(int exit_code) /* Another thread got here before we took the lock. */ exit_code = sig->group_exit_code; else if (sig->group_exec_task) - exit_code = 0; + ; else { sig->group_exit_code = exit_code; sig->flags = SIGNAL_GROUP_EXIT; diff --git a/kernel/signal.c b/kernel/signal.c index 9c2b32c4d755..39fbf9c9474a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2865,10 +2865,13 @@ bool get_signal(struct ksignal *ksig) for (;;) { struct k_sigaction *ka; enum pid_type type; + int exit_code = 0; /* Has this task already been marked for death? */ if ((signal->flags & SIGNAL_GROUP_EXIT) || signal->group_exec_task) { + if (signal->flags & SIGNAL_GROUP_EXIT) + exit_code = signal->group_exit_code; signr = SIGKILL; sigdelset(¤t->pending.signal, SIGKILL); trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, @@ -2998,14 +3001,15 @@ bool get_signal(struct ksignal *ksig) continue; } + /* + * Anything else is fatal, maybe with a core dump. + */ + exit_code = signr; fatal: spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) cgroup_leave_frozen(true); - /* - * Anything else is fatal, maybe with a core dump. - */ current->flags |= PF_SIGNALED; if (sig_kernel_coredump(signr)) { @@ -3035,7 +3039,7 @@ bool get_signal(struct ksignal *ksig) /* * Death signals, no core dump. */ - do_group_exit(signr); + do_group_exit(exit_code); /* NOTREACHED */ } spin_unlock_irq(&sighand->siglock); -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 02/11] signal: In get_signal call do_exit when it is unnecessary to shoot down threads 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman 2026-06-26 16:54 ` [PATCH 01/11] signal: Compute the exit_code in get_signal Eric W. Biederman @ 2026-06-26 16:54 ` Eric W. Biederman 2026-06-26 16:55 ` [PATCH 03/11] signal: Bring down all threads when handling a non-coredump fatal signal Eric W. Biederman ` (9 subsequent siblings) 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:54 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov In get_signal if other threads of the current process do not need to be shot down calling do_group_exit is equivalent to calling do_exit. The code in get_signal is only responsible for shooting down threads when it dequeues a signal and decides the signal is fatal. To remove special cases and make the code easier to read, call do_exit instead of do_group_exit when no other threads need to be shot down. With do_group_exit no longer being called when exec is terminating threads in de_thread remove the special case in do_group_exit for handling exec. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/exit.c | 4 ---- kernel/signal.c | 7 ++++++- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index ae143be7c831..4bfecf2a510d 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1097,8 +1097,6 @@ do_group_exit(int exit_code) if (sig->flags & SIGNAL_GROUP_EXIT) exit_code = sig->group_exit_code; - else if (sig->group_exec_task) - ; else { struct sighand_struct *const sighand = current->sighand; @@ -1106,8 +1104,6 @@ do_group_exit(int exit_code) if (sig->flags & SIGNAL_GROUP_EXIT) /* Another thread got here before we took the lock. */ exit_code = sig->group_exit_code; - else if (sig->group_exec_task) - ; else { sig->group_exit_code = exit_code; sig->flags = SIGNAL_GROUP_EXIT; diff --git a/kernel/signal.c b/kernel/signal.c index 39fbf9c9474a..d98307964ee5 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2863,6 +2863,7 @@ bool get_signal(struct ksignal *ksig) } for (;;) { + bool group_exit_needed = false; struct k_sigaction *ka; enum pid_type type; int exit_code = 0; @@ -3005,6 +3006,7 @@ bool get_signal(struct ksignal *ksig) * Anything else is fatal, maybe with a core dump. */ exit_code = signr; + group_exit_needed = true; fatal: spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) @@ -3039,7 +3041,10 @@ bool get_signal(struct ksignal *ksig) /* * Death signals, no core dump. */ - do_group_exit(exit_code); + if (group_exit_needed) + do_group_exit(exit_code); + else + do_exit(exit_code); /* NOTREACHED */ } spin_unlock_irq(&sighand->siglock); -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 03/11] signal: Bring down all threads when handling a non-coredump fatal signal 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman 2026-06-26 16:54 ` [PATCH 01/11] signal: Compute the exit_code in get_signal Eric W. Biederman 2026-06-26 16:54 ` [PATCH 02/11] signal: In get_signal call do_exit when it is unnecessary to shoot down threads Eric W. Biederman @ 2026-06-26 16:55 ` Eric W. Biederman 2026-06-26 16:55 ` [PATCH 04/11] signal: Move stopping for the coredump from do_exit into get_signal Eric W. Biederman ` (8 subsequent siblings) 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:55 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov For non-coredump fatal signals instead of dropping and reacquiring siglock to shoot down the other threads from do_group_exit at the end of get_signal, shoot down the other threads before siglock is dropped. This can not be done for coredump signals yet, because do_coredump needs to be in a position to catch dying threads before it kills them so it can make certain to catch them, so they can be added to the coredump. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/signal.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/kernel/signal.c b/kernel/signal.c index d98307964ee5..d111b779cbdb 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3006,7 +3006,21 @@ bool get_signal(struct ksignal *ksig) * Anything else is fatal, maybe with a core dump. */ exit_code = signr; - group_exit_needed = true; + if (sig_kernel_coredump(signr)) + group_exit_needed = true; + else { + struct task_struct *t; + signal->flags = SIGNAL_GROUP_EXIT; + signal->group_exit_code = signr; + signal->group_stop_count = 0; + __for_each_thread(signal, t) { + task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); + if (t != current) { + sigaddset(&t->pending.signal, SIGKILL); + signal_wake_up(t, 1); + } + } + } fatal: spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 04/11] signal: Move stopping for the coredump from do_exit into get_signal 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman ` (2 preceding siblings ...) 2026-06-26 16:55 ` [PATCH 03/11] signal: Bring down all threads when handling a non-coredump fatal signal Eric W. Biederman @ 2026-06-26 16:55 ` Eric W. Biederman 2026-06-26 16:56 ` [PATCH 05/11] signal: Move audit_core_dumps from do_coredump " Eric W. Biederman ` (7 subsequent siblings) 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:55 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov Stopping to participate in a coredump from a kernel oops makes no sense and is actively dangerous because the kernel is known to be broken. Considering to stop in a coredump from a kernel thread exit is silly because userspace coredumps are not generated from kernel threads. Not stopping for a coredump in exit(2) and exit_group(2) and related userspace exits that call do_exit or do_group_exit directly is the current behavior of the code as the PF_SIGNALED test in coredump_task_exit attests. Since only tasks that pass through get_signal and set PF_SIGNALED can join coredumps move stopping for coredumps into get_signal, where the PF_SIGNALED test is unnecessary. This avoids even the potential of stopping for coredumps in the silly or dangerous places. This can be seen to be safe by examining the few places that call do_exit: - get_signal calling do_group_exit Called by get_signal to terminate the userspace process. As stopping for the coredump happens now happens in get_signal the code will continue to participate in the coredump. - exit_group(2) calling do_group_exit If a thread calls exit_group(2) while another thread in the same process is performing a coredump there is a race. The thread that wins the race will take the lock and set SIGNAL_GROUP_EXIT. If it is the thread that called do_group_exit then zap_threads will return -EAGAIN and no coredump will be generated. If it is the thread that is coredumping that wins the race, the task that called do_group_exit will exit gracefully with an error code before the coredump begins. Having a single thread exit just before the coredump starts is not ideal as the semantics make no sense. (Did the group exit happen before the coredump or did the coredump happen before the group exit?). Eventually I intend for group exits to flow through get_signal and this silliness will no longer be possible. Until then the current behavior when this race occurs is maintained. - io_uring Called after get_signal returns to terminate the I/O worker thread (essentially a userspace thread that only runs kernel code) so that additional cleanup code can be run before do_exit. As get_signal is called the prior to do_exit code will continue to participate in the coredump. - make_task_dead Called on an unhandled kernel or hardware failure. As the failure is unhandled any extra work has the potential to make the failure worse so being part of a coredump is not appropriate. - kthread_exit Called to terminate a kernel thread as such coredumps do not exist. - call_usermodehelper_exec_async Called to terminate a kernel thread if kerenel_execve fails, as it is a kernel thread coredumps do not exist. - reboot, seeccomp For these calls of do_exit() they are semantically direct calls of exit(2) today. As do_exit() does not synchronize with siglock there is no logical race between a coredump killing the thread and these threads exiting. These threads logically exit before the coredump happens. This is also the current behavior so there is nothing to be concerned about with respect to userspsace semantics or regresssions. Moving the coredump stop for userspace threads that did not dequeue the coredumping signal from from do_exit into get_signal in general is safe, because the coredump in the single threaded case completely happens in get_signal. The code movement ensures that a multi-threaded coredump will not have any issues because the additional threads stop after some amount of cleanup has been done. The coredump code is robust to all kinds of userspace changes happening in parallel as multiple processes can share a mm. This makes the it safe to perform the coredump before the io_uring cleanup happens as io_uring can't do anything another process sharing the mm would not be doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/coredump.c | 25 ++++++++++++++++++++++++- include/linux/coredump.h | 2 ++ kernel/exit.c | 35 +++++++---------------------------- kernel/signal.c | 5 +++++ mm/oom_kill.c | 2 +- 5 files changed, 39 insertions(+), 30 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index bb6fdb1f458e..96801792a80e 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -521,6 +521,29 @@ static int zap_threads(struct task_struct *tsk, return nr; } +void coredump_join(struct core_state *core_state) +{ + /* Stop and join the in-progress coredump */ + struct core_thread self; + + self.task = current; + self.next = xchg(&core_state->dumper.next, &self); + /* + * Implies mb(), the result of xchg() must be visible + * to core_state->dumper. + */ + if (atomic_dec_and_test(&core_state->nr_threads)) + complete(&core_state->startup); + + for (;;) { + set_current_state(TASK_IDLE|TASK_FREEZABLE); + if (!self.task) /* see coredump_finish() */ + break; + schedule(); + } + __set_current_state(TASK_RUNNING); +} + static int coredump_wait(int exit_code, struct core_state *core_state) { struct task_struct *tsk = current; @@ -567,7 +590,7 @@ static void coredump_finish(bool core_dumped) next = curr->next; task = curr->task; /* - * see coredump_task_exit(), curr->task must not see + * see coredump_join(), curr->task must not see * ->task == NULL before we read ->next. */ smp_mb(); diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 68861da4cf7c..c183c95f9063 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -43,6 +43,7 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr); extern int dump_align(struct coredump_params *cprm, int align); int dump_user_range(struct coredump_params *cprm, unsigned long start, unsigned long len); +extern void coredump_join(struct core_state *core_state); extern void vfs_coredump(const kernel_siginfo_t *siginfo); /* @@ -63,6 +64,7 @@ extern void vfs_coredump(const kernel_siginfo_t *siginfo); #define coredump_report_failure(fmt, ...) __COREDUMP_PRINTK(KERN_WARNING, fmt, ##__VA_ARGS__) #else +extern inline void coredump_join(struct core_state *core_state) {} static inline void vfs_coredump(const kernel_siginfo_t *siginfo) {} #define coredump_report(...) diff --git a/kernel/exit.c b/kernel/exit.c index 4bfecf2a510d..20dfa8b2101f 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -421,32 +421,6 @@ kill_orphaned_pgrp(struct task_struct *tsk, struct task_struct *parent) } } -static void coredump_task_exit(struct task_struct *tsk, - struct core_state *core_state) -{ - struct core_thread self; - - self.task = tsk; - if (self.task->flags & PF_SIGNALED) - self.next = xchg(&core_state->dumper.next, &self); - else - self.task = NULL; - /* - * Implies mb(), the result of xchg() must be visible - * to core_state->dumper. - */ - if (atomic_dec_and_test(&core_state->nr_threads)) - complete(&core_state->startup); - - for (;;) { - set_current_state(TASK_IDLE|TASK_FREEZABLE); - if (!self.task) /* see coredump_finish() */ - break; - schedule(); - } - __set_current_state(TASK_RUNNING); -} - #ifdef CONFIG_MEMCG /* drops tasklist_lock if succeeds */ static bool __try_to_set_owner(struct task_struct *tsk, struct mm_struct *mm) @@ -889,8 +863,13 @@ static void synchronize_group_exit(struct task_struct *tsk, long code) core_state = signal->core_state; spin_unlock_irq(&sighand->siglock); - if (unlikely(core_state)) - coredump_task_exit(tsk, core_state); + /* + * Decrement ->nr_threads and possibly complete + * core_state->startup to politely skip participating in any + * pending coredumps. + */ + if (unlikely(core_state) && atomic_dec_and_test(&core_state->nr_threads)) + complete(&core_state->startup); } void __noreturn do_exit(long code) diff --git a/kernel/signal.c b/kernel/signal.c index d111b779cbdb..c211b520982f 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2864,6 +2864,7 @@ bool get_signal(struct ksignal *ksig) for (;;) { bool group_exit_needed = false; + struct core_state *core_state; struct k_sigaction *ka; enum pid_type type; int exit_code = 0; @@ -3022,6 +3023,7 @@ bool get_signal(struct ksignal *ksig) } } fatal: + core_state = signal->core_state; spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) cgroup_leave_frozen(true); @@ -3041,6 +3043,9 @@ bool get_signal(struct ksignal *ksig) * that value and ignore the one we pass it. */ vfs_coredump(&ksig->info); + } else if (core_state) { + /* Wait for the coredump to happen */ + coredump_join(core_state); } /* diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 5f372f6e26fa..ff9d59963561 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -840,7 +840,7 @@ static inline bool __task_will_free_mem(struct task_struct *task) /* * A coredumping process may sleep for an extended period in - * coredump_task_exit(), so the oom killer cannot assume that + * get_signal(), so the oom killer cannot assume that * the process will promptly exit and release memory. */ if (sig->core_state) -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 05/11] signal: Move audit_core_dumps from do_coredump into get_signal 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman ` (3 preceding siblings ...) 2026-06-26 16:55 ` [PATCH 04/11] signal: Move stopping for the coredump from do_exit into get_signal Eric W. Biederman @ 2026-06-26 16:56 ` Eric W. Biederman 2026-06-26 16:57 ` [PATCH 06/11] coredump: In zap_threads complete startup if there is no need to wait Eric W. Biederman ` (6 subsequent siblings) 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:56 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov The function audit_core_dumps is not about the coredumps but about detecting the conditions that would trigger a coredump, and logging something when that happens. The function audit_core_dumps runs even if a coredump never happens. So move audit_core_dumps out of do_coredump and into get_signal to make it clear it does not care about the actual core dumps. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/coredump.c | 2 -- kernel/signal.c | 1 + 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 96801792a80e..14ec61c8d982 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -1205,8 +1205,6 @@ void vfs_coredump(const kernel_siginfo_t *siginfo) .cpu = raw_smp_processor_id(), }; - audit_core_dumps(siginfo->si_signo); - if (coredump_skip(&cprm, binfmt)) return; diff --git a/kernel/signal.c b/kernel/signal.c index c211b520982f..986221bb0e0a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3034,6 +3034,7 @@ bool get_signal(struct ksignal *ksig) if (print_fatal_signals) print_fatal_signal(signr); proc_coredump_connector(current); + audit_core_dumps(ksig->info.si_signo); /* * If it was able to dump core, this kills all * other threads in the group and synchronizes with -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 06/11] coredump: In zap_threads complete startup if there is no need to wait 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman ` (4 preceding siblings ...) 2026-06-26 16:56 ` [PATCH 05/11] signal: Move audit_core_dumps from do_coredump " Eric W. Biederman @ 2026-06-26 16:57 ` Eric W. Biederman 2026-06-26 16:57 ` [PATCH 07/11] signal: Use the thread killing in get_signal for coredumps Eric W. Biederman ` (5 subsequent siblings) 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:57 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov Remove the need to test the value of core_waiters in coredump_wait by completing core_state->startup when there is an error or there are no other tasks to wait for. This slightly simplifies the logic and prepares for moving zap_threads out of coredump_wait. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/coredump.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 14ec61c8d982..0aa235429cfa 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -517,6 +517,8 @@ static int zap_threads(struct task_struct *tsk, tsk->flags |= PF_DUMPCORE; atomic_set(&core_state->nr_threads, nr); } + if (nr <= 0) + complete(&core_state->startup); spin_unlock_irq(&tsk->sighand->siglock); return nr; } @@ -547,28 +549,26 @@ void coredump_join(struct core_state *core_state) static int coredump_wait(int exit_code, struct core_state *core_state) { struct task_struct *tsk = current; - int core_waiters = -EBUSY; + struct core_thread *ptr; + int core_waiters; init_completion(&core_state->startup); core_state->dumper.task = tsk; core_state->dumper.next = NULL; core_waiters = zap_threads(tsk, core_state, exit_code); - if (core_waiters > 0) { - struct core_thread *ptr; - wait_for_completion_state(&core_state->startup, - TASK_UNINTERRUPTIBLE|TASK_FREEZABLE); - /* - * Wait for all the threads to become inactive, so that - * all the thread context (extended register state, like - * fpu etc) gets copied to the memory. - */ - ptr = core_state->dumper.next; - while (ptr != NULL) { - wait_task_inactive(ptr->task, TASK_ANY); - ptr = ptr->next; - } + wait_for_completion_state(&core_state->startup, + TASK_UNINTERRUPTIBLE|TASK_FREEZABLE); + /* + * Wait for all the threads to become inactive, so that + * all the thread context (extended register state, like + * fpu etc) gets copied to the memory. + */ + ptr = core_state->dumper.next; + while (ptr != NULL) { + wait_task_inactive(ptr->task, TASK_ANY); + ptr = ptr->next; } return core_waiters; -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 07/11] signal: Use the thread killing in get_signal for coredumps 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman ` (5 preceding siblings ...) 2026-06-26 16:57 ` [PATCH 06/11] coredump: In zap_threads complete startup if there is no need to wait Eric W. Biederman @ 2026-06-26 16:57 ` Eric W. Biederman 2026-06-26 16:58 ` [PATCH 08/11] exit: Make do_group_exit static Eric W. Biederman ` (4 subsequent siblings) 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:57 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov Now that coredumps are per process there is no reason for the coredump code to have it's own routine to kill the threads of the process. The coredump code does need to have a routine to catch the threads that will be part of the coredump, and to only catch them if a coredump will be generated. Split out coredump_begin from do_coredump so that the threads of the process can be caught in the coredump. Also move the logic to decide if a coredump should be generated into coredump_begin, with do_coredump now simply returning immediately if coredump_begin has decided not to capture a coredump. Update get_signal to always shoot down the threads of the process, and to call coredump_begin if a coredump needs to be started. Remove the call of do_group_exit in get_signal as it is unnecessary. The practical reason for splitting coredump_begin out from do_coredump is so that I don't have to analyze if cgroup_leave_frozen, print_fatal_signal, proc_coredump_connector and audit_core_dumps can safely be called under siglock. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/coredump.c | 114 +++++++++++++++-------------------- include/linux/coredump.h | 2 + include/linux/sched/signal.h | 1 + kernel/signal.c | 44 +++++--------- 4 files changed, 66 insertions(+), 95 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 0aa235429cfa..26bd1b3e9a03 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -481,46 +481,49 @@ static bool coredump_parse(struct core_name *cn, struct coredump_params *cprm, return true; } -static int zap_process(struct signal_struct *signal, int exit_code) +static inline bool coredump_skip(unsigned long mm_flags, + const struct linux_binfmt *binfmt) +{ + if (!binfmt) + return true; + if (!binfmt->core_dump) + return true; + if (!__get_dumpable(mm_flags)) + return true; + return false; +} + +void coredump_begin(struct core_state *core_state) { + /* Called with siglock held */ + struct task_struct *tsk = current; + struct signal_struct *signal = tsk->signal; + struct mm_struct *mm = tsk->mm; + struct linux_binfmt * binfmt = mm->binfmt; + unsigned long mm_flags = __mm_flags_get_dumpable(mm); struct task_struct *t; int nr = 0; - signal->flags = SIGNAL_GROUP_EXIT; - signal->group_exit_code = exit_code; - signal->group_stop_count = 0; + if (coredump_skip(mm_flags, binfmt)) + return; - __for_each_thread(signal, t) { - task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); - if (t != current && !(t->flags & PF_POSTCOREDUMP)) { - sigaddset(&t->pending.signal, SIGKILL); - signal_wake_up(t, 1); - nr++; - } - } + init_completion(&core_state->startup); + core_state->dumper.task = tsk; + core_state->dumper.next = NULL; + core_state->mm_flags = mm_flags; - return nr; -} + /* Count how may other threads will participate in the coredump */ + __for_each_thread(signal, t) + nr += (t != tsk) && !(t->flags & PF_POSTCOREDUMP); -static int zap_threads(struct task_struct *tsk, - struct core_state *core_state, int exit_code) -{ - struct signal_struct *signal = tsk->signal; - int nr = -EAGAIN; - - spin_lock_irq(&tsk->sighand->siglock); - if (!(signal->flags & SIGNAL_GROUP_EXIT) && !signal->group_exec_task) { - /* Allow SIGKILL, see prepare_signal() */ - signal->core_state = core_state; - nr = zap_process(signal, exit_code); - clear_tsk_thread_flag(tsk, TIF_SIGPENDING); - tsk->flags |= PF_DUMPCORE; - atomic_set(&core_state->nr_threads, nr); - } - if (nr <= 0) + atomic_set(&core_state->nr_threads, nr); + if (nr == 0) complete(&core_state->startup); - spin_unlock_irq(&tsk->sighand->siglock); - return nr; + + /* Allow SIGKILL, see prepare_signal() */ + signal->core_state = core_state; + clear_tsk_thread_flag(tsk, TIF_SIGPENDING); + tsk->flags |= PF_DUMPCORE; } void coredump_join(struct core_state *core_state) @@ -546,17 +549,9 @@ void coredump_join(struct core_state *core_state) __set_current_state(TASK_RUNNING); } -static int coredump_wait(int exit_code, struct core_state *core_state) +static void coredump_wait(struct core_state *core_state) { - struct task_struct *tsk = current; struct core_thread *ptr; - int core_waiters; - - init_completion(&core_state->startup); - core_state->dumper.task = tsk; - core_state->dumper.next = NULL; - - core_waiters = zap_threads(tsk, core_state, exit_code); wait_for_completion_state(&core_state->startup, TASK_UNINTERRUPTIBLE|TASK_FREEZABLE); @@ -570,8 +565,6 @@ static int coredump_wait(int exit_code, struct core_state *core_state) wait_task_inactive(ptr->task, TASK_ANY); ptr = ptr->next; } - - return core_waiters; } static void coredump_finish(bool core_dumped) @@ -1101,18 +1094,6 @@ static void coredump_cleanup(struct core_name *cn, struct coredump_params *cprm) coredump_finish(cn->core_dumped); } -static inline bool coredump_skip(const struct coredump_params *cprm, - const struct linux_binfmt *binfmt) -{ - if (!binfmt) - return true; - if (!binfmt->core_dump) - return true; - if (!__get_dumpable(cprm->mm_flags)) - return true; - return false; -} - static void do_coredump(struct core_name *cn, struct coredump_params *cprm, size_t **argv, int *argc, const struct linux_binfmt *binfmt) { @@ -1185,7 +1166,7 @@ static void do_coredump(struct core_name *cn, struct coredump_params *cprm, void vfs_coredump(const kernel_siginfo_t *siginfo) { size_t *argv __free(kfree) = NULL; - struct core_state core_state; + struct core_state *core_state = current->signal->core_state; struct core_name cn; const struct mm_struct *mm = current->mm; const struct linux_binfmt *binfmt = mm->binfmt; @@ -1193,21 +1174,21 @@ void vfs_coredump(const kernel_siginfo_t *siginfo) struct coredump_params cprm = { .siginfo = siginfo, .limit = rlimit(RLIMIT_CORE), - /* - * We must use the same mm->flags while dumping core to avoid - * inconsistency of bit flags, since this flag is not protected - * by any locks. - * - * Note that we only care about MMF_DUMP* flags. - */ - .mm_flags = __mm_flags_get_dumpable(mm), .vma_meta = NULL, .cpu = raw_smp_processor_id(), }; - if (coredump_skip(&cprm, binfmt)) + /* coredump_begin decided not to coredump */ + if (!core_state) return; + /* + * We must use the same mm->flags while dumping core to avoid + * inconsistency of bit flags, since this flag is not protected + * by any locks. + */ + cprm.mm_flags = core_state->mm_flags; + CLASS(prepare_creds, cred)(); if (!cred) return; @@ -1220,8 +1201,7 @@ void vfs_coredump(const kernel_siginfo_t *siginfo) if (coredump_force_suid_safe(&cprm)) cred->fsuid = GLOBAL_ROOT_UID; - if (coredump_wait(siginfo->si_signo, &core_state) < 0) - return; + coredump_wait(core_state); scoped_with_creds(cred) do_coredump(&cn, &cprm, &argv, &argc, binfmt); diff --git a/include/linux/coredump.h b/include/linux/coredump.h index c183c95f9063..d315ddccbf95 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -43,6 +43,7 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr); extern int dump_align(struct coredump_params *cprm, int align); int dump_user_range(struct coredump_params *cprm, unsigned long start, unsigned long len); +extern void coredump_begin(struct core_state *core_state); extern void coredump_join(struct core_state *core_state); extern void vfs_coredump(const kernel_siginfo_t *siginfo); @@ -64,6 +65,7 @@ extern void vfs_coredump(const kernel_siginfo_t *siginfo); #define coredump_report_failure(fmt, ...) __COREDUMP_PRINTK(KERN_WARNING, fmt, ##__VA_ARGS__) #else +static inline void coredump_begin(struct core_state *core_state) {} extern inline void coredump_join(struct core_state *core_state) {} static inline void vfs_coredump(const kernel_siginfo_t *siginfo) {} diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 584ae88b435e..1ea0a89cbef0 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -80,6 +80,7 @@ struct core_thread { struct core_state { atomic_t nr_threads; + unsigned long mm_flags; struct core_thread dumper; struct completion startup; }; diff --git a/kernel/signal.c b/kernel/signal.c index 986221bb0e0a..89075c60b92b 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2863,8 +2863,8 @@ bool get_signal(struct ksignal *ksig) } for (;;) { - bool group_exit_needed = false; - struct core_state *core_state; + struct core_state local_core_state, *core_state; + struct task_struct *t; struct k_sigaction *ka; enum pid_type type; int exit_code = 0; @@ -3007,22 +3007,20 @@ bool get_signal(struct ksignal *ksig) * Anything else is fatal, maybe with a core dump. */ exit_code = signr; - if (sig_kernel_coredump(signr)) - group_exit_needed = true; - else { - struct task_struct *t; - signal->flags = SIGNAL_GROUP_EXIT; - signal->group_exit_code = signr; - signal->group_stop_count = 0; - __for_each_thread(signal, t) { - task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); - if (t != current) { - sigaddset(&t->pending.signal, SIGKILL); - signal_wake_up(t, 1); - } + signal->flags = SIGNAL_GROUP_EXIT; + signal->group_exit_code = exit_code; + signal->group_stop_count = 0; + __for_each_thread(signal, t) { + task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); + if (t != current) { + sigaddset(&t->pending.signal, SIGKILL); + signal_wake_up(t, 1); } } fatal: + /* Setup to collect a coredump */ + if (sig_kernel_coredump(signr)) + coredump_begin(&local_core_state); core_state = signal->core_state; spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) @@ -3035,14 +3033,7 @@ bool get_signal(struct ksignal *ksig) print_fatal_signal(signr); proc_coredump_connector(current); audit_core_dumps(ksig->info.si_signo); - /* - * If it was able to dump core, this kills all - * other threads in the group and synchronizes with - * their demise. If we lost the race with another - * thread getting here, it set group_exit_code - * first and our do_group_exit call below will use - * that value and ignore the one we pass it. - */ + /* If dumping write out the coredump */ vfs_coredump(&ksig->info); } else if (core_state) { /* Wait for the coredump to happen */ @@ -3059,12 +3050,9 @@ bool get_signal(struct ksignal *ksig) goto out; /* - * Death signals, no core dump. + * Death signals. */ - if (group_exit_needed) - do_group_exit(exit_code); - else - do_exit(exit_code); + do_exit(exit_code); /* NOTREACHED */ } spin_unlock_irq(&sighand->siglock); -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 08/11] exit: Make do_group_exit static 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman ` (6 preceding siblings ...) 2026-06-26 16:57 ` [PATCH 07/11] signal: Use the thread killing in get_signal for coredumps Eric W. Biederman @ 2026-06-26 16:58 ` Eric W. Biederman 2026-06-26 16:59 ` [PATCH 09/11] signal: Dequeue fatal signals Eric W. Biederman ` (3 subsequent siblings) 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:58 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov Now that do_group_exit only has a single caller in exit.c make it static so this is obvious. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/sched/task.h | 1 - kernel/exit.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 41ed884cffc9..8b1a85a54999 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -90,7 +90,6 @@ static inline void exit_thread(struct task_struct *tsk) { } #endif -extern __noreturn void do_group_exit(int); extern void exit_files(struct task_struct *); extern void exit_itimers(struct task_struct *); diff --git a/kernel/exit.c b/kernel/exit.c index 20dfa8b2101f..c4e7e71e83e2 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1069,7 +1069,7 @@ SYSCALL_DEFINE1(exit, int, error_code) * Take down every thread in the group. This is called by fatal signals * as well as by sys_exit_group (below). */ -void __noreturn +static void __noreturn do_group_exit(int exit_code) { struct signal_struct *sig = current->signal; -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 09/11] signal: Dequeue fatal signals 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman ` (7 preceding siblings ...) 2026-06-26 16:58 ` [PATCH 08/11] exit: Make do_group_exit static Eric W. Biederman @ 2026-06-26 16:59 ` Eric W. Biederman 2026-06-26 16:59 ` [PATCH 10/11] signal: Short circuit deliver coredump signals Eric W. Biederman ` (2 subsequent siblings) 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:59 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov Fatal signals are detected early and historically have not been dequeued. This barely matters as the process exits immediately. Not dequeuing the signal is visible to userspace inspecting the dying process through proc and will be to coredumps once we start using short circuit delivery for them. To keep things simple always populate siginfo in dequeue_exit_signal and always pass the dequeueed siginfo to trace_signal_deliver. In the slim chance that the fatal signal was a posix timer free the posix timer's sigqueue. In general this is not safe with tasklist_lock held because tasklist_lock needs to nest under it_lock. In this case I have read through posixtimer_sigqueue_putref and I can not find it taking the timer's it_lock. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/sched/signal.h | 1 + kernel/signal.c | 40 ++++++++++++++++++++++++++++-------- 2 files changed, 33 insertions(+), 8 deletions(-) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 1ea0a89cbef0..df7a3c4530e4 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -262,6 +262,7 @@ struct signal_struct { #define SIGNAL_STOP_STOPPED 0x00000001 /* job control stop in effect */ #define SIGNAL_STOP_CONTINUED 0x00000002 /* SIGCONT since WCONTINUED reap */ #define SIGNAL_GROUP_EXIT 0x00000004 /* group exit in progress */ +#define SIGNAL_EXIT_DEQUEUE 0x00000008 /* Dequeue the exit signal */ /* * Pending notifications to parent. */ diff --git a/kernel/signal.c b/kernel/signal.c index 89075c60b92b..ce3a99573aa9 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -665,6 +665,34 @@ int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type) } EXPORT_SYMBOL_GPL(dequeue_signal); +static int dequeue_exit_signal( + struct task_struct *tsk, int exit_code, kernel_siginfo_t *info) +{ + struct signal_struct *signal = tsk->signal; + + if (signal->flags & SIGNAL_EXIT_DEQUEUE) { + struct sigpending *pending = NULL; + struct sigqueue *timer_sigq; + int signr = exit_code; + + signal->flags &= ~SIGNAL_EXIT_DEQUEUE; + + pending = sigismember(&tsk->pending.signal, signr) ? + &tsk->pending : &signal->shared_pending; + + collect_signal(signr, pending, info, &timer_sigq); + if (unlikely(timer_sigq)) { + posixtimer_sigqueue_putref(timer_sigq); + } + return signr; + } + /* There is no short-circuit signal to dequeue -- fake something */ + clear_siginfo(info); + info->si_signo = SIGKILL; + info->si_code = SI_KERNEL; + return info->si_signo; +} + static int dequeue_synchronous_signal(kernel_siginfo_t *info) { struct task_struct *tsk = current; @@ -1012,7 +1040,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) * running and doing things after a slower * thread has the fatal signal pending. */ - signal->flags = SIGNAL_GROUP_EXIT; + signal->flags = SIGNAL_GROUP_EXIT | SIGNAL_EXIT_DEQUEUE; signal->group_exit_code = sig; signal->group_stop_count = 0; __for_each_thread(signal, t) { @@ -2874,15 +2902,11 @@ bool get_signal(struct ksignal *ksig) signal->group_exec_task) { if (signal->flags & SIGNAL_GROUP_EXIT) exit_code = signal->group_exit_code; - signr = SIGKILL; sigdelset(¤t->pending.signal, SIGKILL); - trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, - &sighand->action[SIGKILL-1]); + signr = dequeue_exit_signal(current, exit_code, &ksig->info); + trace_signal_deliver(signr, &ksig->info, + &sighand->action[signr-1]); recalc_sigpending(); - /* - * implies do_group_exit() or return to PF_USER_WORKER, - * no need to initialize ksig->info/etc. - */ goto fatal; } -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 10/11] signal: Short circuit deliver coredump signals 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman ` (8 preceding siblings ...) 2026-06-26 16:59 ` [PATCH 09/11] signal: Dequeue fatal signals Eric W. Biederman @ 2026-06-26 16:59 ` Eric W. Biederman 2026-06-26 17:00 ` [PATCH 11/11] signal: Remove SA_IMMUTABLE Eric W. Biederman 2026-06-28 14:29 ` [PATCH 0/11] Short circuit delivery for coredump signals Oleg Nesterov 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 16:59 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov The coredump rendevous start is the same as the process killing that complete_signal performs. get_signal now gets the siginfo and the signal number when a signal is short circuit delivered. Start short circuit deliverying coredump signals as there is nothing remaining that prevents their short circuit delivery. This means that processes that coredump will now exit faster and fatal_signal_pending will return true until the coredump starts. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/signal.c | 29 +++++++++++++---------------- 1 file changed, 13 insertions(+), 16 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index ce3a99573aa9..0b602dfb0b78 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1032,24 +1032,21 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) (sig == SIGKILL || !p->ptrace)) { /* * This signal will be fatal to the whole group. + * + * Start a group exit and wake everybody up. + * This way we don't have other threads + * running and doing things after a slower + * thread has the fatal signal pending. */ - if (!sig_kernel_coredump(sig)) { - /* - * Start a group exit and wake everybody up. - * This way we don't have other threads - * running and doing things after a slower - * thread has the fatal signal pending. - */ - signal->flags = SIGNAL_GROUP_EXIT | SIGNAL_EXIT_DEQUEUE; - signal->group_exit_code = sig; - signal->group_stop_count = 0; - __for_each_thread(signal, t) { - task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); - sigaddset(&t->pending.signal, SIGKILL); - signal_wake_up(t, 1); - } - return; + signal->flags = SIGNAL_GROUP_EXIT | SIGNAL_EXIT_DEQUEUE; + signal->group_exit_code = sig; + signal->group_stop_count = 0; + __for_each_thread(signal, t) { + task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); + sigaddset(&t->pending.signal, SIGKILL); + signal_wake_up(t, 1); } + return; } /* -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 11/11] signal: Remove SA_IMMUTABLE 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman ` (9 preceding siblings ...) 2026-06-26 16:59 ` [PATCH 10/11] signal: Short circuit deliver coredump signals Eric W. Biederman @ 2026-06-26 17:00 ` Eric W. Biederman 2026-06-28 14:29 ` [PATCH 0/11] Short circuit delivery for coredump signals Oleg Nesterov 11 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-26 17:00 UTC (permalink / raw) To: Andrew Morton Cc: Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Oleg Nesterov Now that fatal signals experience short circuit delivery in __send_signal_locked. There is no longer a race between sigaction changing the signal handler and the signal handler being forced to SIG_DFL, for fatal forced signals. So remove SA_IMMUTABLE whose job it was to stop that race. --- include/linux/signal_types.h | 3 --- include/uapi/asm-generic/signal-defs.h | 1 - kernel/signal.c | 9 +-------- 3 files changed, 1 insertion(+), 12 deletions(-) diff --git a/include/linux/signal_types.h b/include/linux/signal_types.h index caf4f7a59ab9..1a3bb540f1c7 100644 --- a/include/linux/signal_types.h +++ b/include/linux/signal_types.h @@ -70,9 +70,6 @@ struct ksignal { int sig; }; -/* Used to kill the race between sigaction and forced signals */ -#define SA_IMMUTABLE 0x00800000 - #ifndef __ARCH_UAPI_SA_FLAGS #ifdef SA_RESTORER #define __ARCH_UAPI_SA_FLAGS SA_RESTORER diff --git a/include/uapi/asm-generic/signal-defs.h b/include/uapi/asm-generic/signal-defs.h index 7572f2f46ee8..fe929e7b77ca 100644 --- a/include/uapi/asm-generic/signal-defs.h +++ b/include/uapi/asm-generic/signal-defs.h @@ -45,7 +45,6 @@ #define SA_UNSUPPORTED 0x00000400 #define SA_EXPOSE_TAGBITS 0x00000800 /* 0x00010000 used on mips */ -/* 0x00800000 used for internal SA_IMMUTABLE */ /* 0x01000000 used on x86 */ /* 0x02000000 used on x86 */ /* diff --git a/kernel/signal.c b/kernel/signal.c index 0b602dfb0b78..d1decfef86c0 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1328,8 +1328,6 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, blocked = sigismember(&t->blocked, sig); if (blocked || ignored || (handler != HANDLER_CURRENT)) { action->sa.sa_handler = SIG_DFL; - if (handler == HANDLER_EXIT) - action->sa.sa_flags |= SA_IMMUTABLE; if (blocked) sigdelset(&t->blocked, sig); } @@ -2946,8 +2944,7 @@ bool get_signal(struct ksignal *ksig) if (!signr) break; /* will return 0 */ - if (unlikely(current->ptrace) && (signr != SIGKILL) && - !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) { + if (unlikely(current->ptrace) && (signr != SIGKILL)) { signr = ptrace_signal(signr, &ksig->info, type); if (!signr) continue; @@ -4351,10 +4348,6 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact) k = &p->sighand->action[sig-1]; spin_lock_irq(&p->sighand->siglock); - if (k->sa.sa_flags & SA_IMMUTABLE) { - spin_unlock_irq(&p->sighand->siglock); - return -EINVAL; - } if (oact) *oact = *k; -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 0/11] Short circuit delivery for coredump signals 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman ` (10 preceding siblings ...) 2026-06-26 17:00 ` [PATCH 11/11] signal: Remove SA_IMMUTABLE Eric W. Biederman @ 2026-06-28 14:29 ` Oleg Nesterov 2026-06-29 6:22 ` Eric W. Biederman 11 siblings, 1 reply; 36+ messages in thread From: Oleg Nesterov @ 2026-06-28 14:29 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds Eric, Please rebase on top of Linus's tree, git am fails at 7/11. So far I didnt' try to read the individual patches, I've applied the whole series on top of 25fe708bbc59 to avoid the conflicts, and after the very quick glance I seem to see some problems. Please correct me. ------------------------------------------------------------------------- complete_signal() does: if (sig_fatal(p, sig) && !sigismember(&t->real_blocked, sig) && (sig == SIGKILL || !p->ptrace)) { /* * This signal will be fatal to the whole group. * * Start a group exit and wake everybody up. * This way we don't have other threads * running and doing things after a slower * thread has the fatal signal pending. */ signal->flags = SIGNAL_GROUP_EXIT | SIGNAL_EXIT_DEQUEUE; signal->group_exit_code = sig; ... kill the thread group ... However, prepare_signal() still does: if (signal->flags & SIGNAL_GROUP_EXIT) { if (signal->core_state) return sig == SIGKILL; /* * The process is in the middle of dying, drop the signal. */ return false; This means that if SIGKILL comes before coredump_begin() sets signal->core_state, it will be lost. ------------------------------------------------------------------------- dequeue_exit_signal: if (signal->flags & SIGNAL_EXIT_DEQUEUE) { struct sigpending *pending = NULL; struct sigqueue *timer_sigq; int signr = exit_code; signal->flags &= ~SIGNAL_EXIT_DEQUEUE; pending = sigismember(&tsk->pending.signal, signr) ? &tsk->pending : &signal->shared_pending; collect_signal(signr, pending, info, &timer_sigq); This looks obviously wrong. 2 threads, T1 and T2. SIGSEGV is sent to T1. T2 calls get_signal(), clears SIGNAL_EXIT_DEQUEUE and returns SIGSEGV. But collect_signal() won't find SIGSEGV, *info will be bogus. T2 calls coredump_begin() and initiates the coredump. The core dump will be written with wrong dumper thread, bogus siginfo (si_addr/etc are lost). Even the filename is wrong if core_pattern includes "%i". Oleg. On 06/26, Eric W. Biederman wrote: > > Oleg's recent patchset tweaking how force_sig_info works has inspired me > to finally push through and update the signal handling to have proper > short circuit deliver for coredump signals. Everything is just simpler > when coredumps are not such a large special case. > > What makes this tricky is coredumps have had their own process > shoot-down logic similar to but separate and different from everything > else in the kernel. The bulk of this set of changes is merging the > process shoot-down logic that is used for signals and the logic for > coredumps. So the same process shoot-down logic can be shared. > > With the shoot-down logic sorted the rest is quite straight forward. > > Who should pick up these changes? Historically I would put it in my own > tree but unfortunately I just have a little bit of time here and there, > and I can't predict when I will have time to work on things. > > Eric W. Biederman (11): > signal: Compute the exit_code in get_signal > signal: In get_signal call do_exit when it is unnecessary to shoot down threads > signal: Bring down all threads when handling a non-coredump fatal signal > signal: Move stopping for the coredump from do_exit into get_signal > signal: Move audit_core_dumps from do_coredump into get_signal > coredump: In zap_threads complete startup if there is no need to wait > signal: Use the thread killing in get_signal for coredumps > exit: Make do_group_exit static > signal: Dequeue fatal signals > signal: Short circuit deliver coredump signals > signal: Remove SA_IMMUTABLE > > fs/coredump.c | 161 +++++++++++++++++---------------- > include/linux/coredump.h | 4 + > include/linux/sched/signal.h | 2 + > include/linux/sched/task.h | 1 - > include/linux/signal_types.h | 3 - > include/uapi/asm-generic/signal-defs.h | 1 - > kernel/exit.c | 41 ++------- > kernel/signal.c | 119 +++++++++++++++--------- > mm/oom_kill.c | 2 +- > 9 files changed, 171 insertions(+), 163 deletions(-) > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/11] Short circuit delivery for coredump signals 2026-06-28 14:29 ` [PATCH 0/11] Short circuit delivery for coredump signals Oleg Nesterov @ 2026-06-29 6:22 ` Eric W. Biederman 2026-06-29 17:45 ` Eric W. Biederman 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman 0 siblings, 2 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-06-29 6:22 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds Oleg Nesterov <oleg@redhat.com> writes: > Eric, > > Please rebase on top of Linus's tree, git am fails at 7/11. This was built on v7.1. Now that v7.2-rc1 is out I will be happy to rebase on top of that. > So far I didnt' try to read the individual patches, I've applied > the whole series on top of 25fe708bbc59 to avoid the conflicts, and > after the very quick glance I seem to see some problems. > > Please correct me. > > ------------------------------------------------------------------------- > complete_signal() does: > > if (sig_fatal(p, sig) && !sigismember(&t->real_blocked, sig) && > (sig == SIGKILL || !p->ptrace)) { > /* > * This signal will be fatal to the whole group. > * > * Start a group exit and wake everybody up. > * This way we don't have other threads > * running and doing things after a slower > * thread has the fatal signal pending. > */ > signal->flags = SIGNAL_GROUP_EXIT | SIGNAL_EXIT_DEQUEUE; > signal->group_exit_code = sig; > ... kill the thread group ... > > However, prepare_signal() still does: > > if (signal->flags & SIGNAL_GROUP_EXIT) { > if (signal->core_state) > return sig == SIGKILL; > /* > * The process is in the middle of dying, drop the signal. > */ > return false; > > This means that if SIGKILL comes before coredump_begin() sets signal->core_state, > it will be lost. I will reexamine that. I used to have something to deal with this case but somehow convinced myself it didn't matter. > ------------------------------------------------------------------------- > dequeue_exit_signal: > > if (signal->flags & SIGNAL_EXIT_DEQUEUE) { > struct sigpending *pending = NULL; > struct sigqueue *timer_sigq; > int signr = exit_code; > > signal->flags &= ~SIGNAL_EXIT_DEQUEUE; > > pending = sigismember(&tsk->pending.signal, signr) ? > &tsk->pending : &signal->shared_pending; > > collect_signal(signr, pending, info, &timer_sigq); > > This looks obviously wrong. 2 threads, T1 and T2. SIGSEGV is sent to T1. > T2 calls get_signal(), clears SIGNAL_EXIT_DEQUEUE and returns SIGSEGV. > But collect_signal() won't find SIGSEGV, *info will be bogus. Ugh. I deliberately allowed the cross thread dumping so that whichever thread won the race could just dump core. I failed to consider it would be a problem for per thread signals. I will have to think a little bit about how to know which queue to remove the signal from. It is tempting to always place fatal signals on the shared_pending queue. Eric ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/11] Short circuit delivery for coredump signals 2026-06-29 6:22 ` Eric W. Biederman @ 2026-06-29 17:45 ` Eric W. Biederman 2026-07-02 10:36 ` Oleg Nesterov 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman 1 sibling, 1 reply; 36+ messages in thread From: Eric W. Biederman @ 2026-06-29 17:45 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds "Eric W. Biederman" <ebiederm@xmission.com> writes: > Oleg Nesterov <oleg@redhat.com> writes: > >> Eric, >> >> Please rebase on top of Linus's tree, git am fails at 7/11. > > This was built on v7.1. > > Now that v7.2-rc1 is out I will be happy to rebase on top of that. > >> So far I didnt' try to read the individual patches, I've applied >> the whole series on top of 25fe708bbc59 to avoid the conflicts, and >> after the very quick glance I seem to see some problems. >> >> Please correct me. >> >> ------------------------------------------------------------------------- >> complete_signal() does: >> >> if (sig_fatal(p, sig) && !sigismember(&t->real_blocked, sig) && >> (sig == SIGKILL || !p->ptrace)) { >> /* >> * This signal will be fatal to the whole group. >> * >> * Start a group exit and wake everybody up. >> * This way we don't have other threads >> * running and doing things after a slower >> * thread has the fatal signal pending. >> */ >> signal->flags = SIGNAL_GROUP_EXIT | SIGNAL_EXIT_DEQUEUE; >> signal->group_exit_code = sig; >> ... kill the thread group ... >> >> However, prepare_signal() still does: >> >> if (signal->flags & SIGNAL_GROUP_EXIT) { >> if (signal->core_state) >> return sig == SIGKILL; >> /* >> * The process is in the middle of dying, drop the signal. >> */ >> return false; >> >> This means that if SIGKILL comes before coredump_begin() sets signal->core_state, >> it will be lost. > > I will reexamine that. I used to have something to deal with this case > but somehow convinced myself it didn't matter. I was thinking of another related problem. In this case loosing SIGKILL before coredump_begin seems fine. The process is already dying of a signal. The only point of supporting SIGKILL at all during a coredump is because writing the coredump out can be slow. So SIGKILL in that case just aborts the coredump. If the coredump hasn't started an abort seems pointless. I can think of ways to tweak the logic but I can't imagine anything reliably working until the code reaches coredump_begin, and TIF_SIGPENDING is cleared. Do you know of something where userspace actually depends upon killing a coredump before it even starts? If not I don't imagine it is worth spending any time on this corner case. ... There is another corner case I just noticed. When force_sig_info_to_task is passed HANDLER_EXIT today get_signal skips ptrace stops when SA_IMMUTABLE is set. My change did not short circuit deliver those signals when the process was ptraced so my last change removing SA_IMMUTABLE was premature. Eric ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/11] Short circuit delivery for coredump signals 2026-06-29 17:45 ` Eric W. Biederman @ 2026-07-02 10:36 ` Oleg Nesterov 2026-07-03 20:16 ` Eric W. Biederman 0 siblings, 1 reply; 36+ messages in thread From: Oleg Nesterov @ 2026-07-02 10:36 UTC (permalink / raw) To: Eric W. Biederman Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds On 06/29, Eric W. Biederman wrote: > > "Eric W. Biederman" <ebiederm@xmission.com> writes: > > > Oleg Nesterov <oleg@redhat.com> writes: > > > >> if (signal->flags & SIGNAL_GROUP_EXIT) { > >> if (signal->core_state) > >> return sig == SIGKILL; > >> /* > >> * The process is in the middle of dying, drop the signal. > >> */ > >> return false; > >> > >> This means that if SIGKILL comes before coredump_begin() sets signal->core_state, > >> it will be lost. > > > > I will reexamine that. I used to have something to deal with this case > > but somehow convinced myself it didn't matter. > > I was thinking of another related problem. > > In this case loosing SIGKILL before coredump_begin seems fine. The process > is already dying of a signal. Hmm. I disagree... > The only point of supporting SIGKILL at all during a coredump is because > writing the coredump out can be slow. So SIGKILL in that case just > aborts the coredump. Yes, the coredumping process is not dead. Yet. It can do a lot of activity and use a lot of resources. > Do you know of something where userspace actually depends upon > killing a coredump before it even starts? Well. I think a user has all rights to assume that SIGKILL must always terminate the process asap, the process killed by SIGKILL must not start the coredumping. > There is another corner case I just noticed. > > When force_sig_info_to_task is passed HANDLER_EXIT today get_signal > skips ptrace stops when SA_IMMUTABLE is set. My change did not short > circuit deliver those signals when the process was ptraced so my last > change removing SA_IMMUTABLE was premature. Yes... and do_sigacttion() between send and delivery... Oleg. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 0/11] Short circuit delivery for coredump signals 2026-07-02 10:36 ` Oleg Nesterov @ 2026-07-03 20:16 ` Eric W. Biederman 0 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 20:16 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds Oleg Nesterov <oleg@redhat.com> writes: > On 06/29, Eric W. Biederman wrote: >> >> "Eric W. Biederman" <ebiederm@xmission.com> writes: >> >> > Oleg Nesterov <oleg@redhat.com> writes: >> > >> >> if (signal->flags & SIGNAL_GROUP_EXIT) { >> >> if (signal->core_state) >> >> return sig == SIGKILL; >> >> /* >> >> * The process is in the middle of dying, drop the signal. >> >> */ >> >> return false; >> >> >> >> This means that if SIGKILL comes before coredump_begin() sets signal->core_state, >> >> it will be lost. >> > >> > I will reexamine that. I used to have something to deal with this case >> > but somehow convinced myself it didn't matter. >> >> I was thinking of another related problem. >> >> In this case loosing SIGKILL before coredump_begin seems fine. The process >> is already dying of a signal. > > Hmm. I disagree... > >> The only point of supporting SIGKILL at all during a coredump is because >> writing the coredump out can be slow. So SIGKILL in that case just >> aborts the coredump. > > Yes, the coredumping process is not dead. Yet. It can do a lot of activity > and use a lot of resources. It is semantically dead. Pragmatically I completely agree. >> Do you know of something where userspace actually depends upon >> killing a coredump before it even starts? > > Well. I think a user has all rights to assume that SIGKILL must always > terminate the process asap, the process killed by SIGKILL must not start > the coredumping. If we arrange things so that semantically SIGKILL is delivered before the signal that triggers the coredump, we can do that without semantic complications. The window is tiny enough I am not certain it matters. There are other issues with that part of code as well. Can we please look at all of these issues after I post the next version of my patchset (later today)? I don't think anything else depends upon them. >> There is another corner case I just noticed. >> >> When force_sig_info_to_task is passed HANDLER_EXIT today get_signal >> skips ptrace stops when SA_IMMUTABLE is set. My change did not short >> circuit deliver those signals when the process was ptraced so my last >> change removing SA_IMMUTABLE was premature. > > Yes... and do_sigacttion() between send and delivery... HANDLER_EXIT implies the signal is fatal so there will be no userspace delivery. SA_IMMUTABLE only exists to ensure that when siglock is dropped that userspace won't change the way get_signal delivers the signal. If we can perform short circuited delivery of the fatal signal in all cases then SA_IMMUTABLE becomes unnecessary. Earlier you mentioned that force_sig_info_to_task combined with dequeue_synchronous signal could do a lot better. I looked back at my proof of concept branch and you are quite right. Especially once we can get short circuit delivery for the coredump signals. Eric ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 00/14] Short circuit delivery for coredump signals 2026-06-29 6:22 ` Eric W. Biederman 2026-06-29 17:45 ` Eric W. Biederman @ 2026-07-03 21:35 ` Eric W. Biederman 2026-07-03 21:36 ` [PATCH 01/14] signal: Generalize posixtimer_queue_sigqueue into enqueue_signal Eric W. Biederman ` (14 more replies) 1 sibling, 15 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:35 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner Oleg's recent patchset tweaking how force_sig_info works has inspired me to finally push through and update the signal handling to have proper short circuit deliver for coredump signals. Everything is just simpler when coredumps are not such a large special case. What makes this tricky is coredumps have had their own process shoot-down logic similar to but separate and different from everything else in the kernel. The bulk of this set of changes is merging the process shoot-down logic that is used for signals and the logic for coredumps. So the same process shoot-down logic can be shared. With the shoot-down logic sorted the rest is quite straight forward. Oleg when reviewing the first version of this set of changes noticed that dequeue_exit_signal did not properly handle thread local signal that trigger a coredump. To resolve this I have added a few more cleanups so that I can detect a fatal signal as it is being enqueued and place it in the shared signal queue. One of those cleanups is a rewrite of detecting if a signal can be delivered immediately when sent aka short circuit delivery. The processing of signals that will be ignored and signals that will cause a process to exit without returning to userspace (such as SIGKILL) are both enhanced. This set of changes is against v7.2-rc1 fs/coredump.c | 161 +++++++++++++++++---------------- include/linux/coredump.h | 4 + include/linux/sched/signal.h | 2 + include/linux/sched/task.h | 1 - include/linux/signal_types.h | 3 - include/uapi/asm-generic/signal-defs.h | 1 - kernel/exit.c | 41 ++------- kernel/signal.c | 119 +++++++++++++++--------- mm/oom_kill.c | 2 +- 9 files changed, 171 insertions(+), 163 deletions(-) Eric W. Biederman (14): signal: Generalize posixtimer_queue_sigqueue into enqueue_signal signal: Factor out sig_blocked from sig_ignored signal: More accurate ignoring of signals based on sig_can_short_circuit signal: Use sig_can_short_circuit to improve fatal signal delivery signal: Compute the exit_code in get_signal signal: In get_signal call do_exit when it is unnecessary to shoot down threads signal: Bring down all threads when handling a non-coredump fatal signal signal: Move stopping for the coredump from do_exit into get_signal signal: Move audit_core_dumps from do_coredump into get_signal coredump: In zap_threads complete startup if there is no need to wait signal: Use the thread killing in get_signal for coredumps exit: Make do_group_exit static signal: Dequeue fatal signals signal: Short circuit deliver coredump signals fs/coredump.c | 153 ++++++++++++++------------- include/linux/coredump.h | 4 + include/linux/sched/signal.h | 2 + include/linux/sched/task.h | 1 - kernel/exit.c | 41 ++------ kernel/signal.c | 246 +++++++++++++++++++++++++++++-------------- mm/oom_kill.c | 2 +- 7 files changed, 260 insertions(+), 189 deletions(-) ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 01/14] signal: Generalize posixtimer_queue_sigqueue into enqueue_signal 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman @ 2026-07-03 21:36 ` Eric W. Biederman 2026-07-03 21:37 ` [PATCH 02/14] signal: Factor out sig_blocked from sig_ignored Eric W. Biederman ` (13 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:36 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner send_signal needs to do all of the same work as posixtimer_queue_sigqueue before calling complete_signal. The only difference is that send_signal might not allocate a sigqueue. So generalize the code to handle an absent sigqueue and create enqueue_signal. Then use enqueue_signal in place of posixtimer_queue_sigqueue and complete_signal. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- kernel/signal.c | 37 ++++++++++++++++++------------------- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 9c2b32c4d755..6b49bae3fce7 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1032,6 +1032,21 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) return; } +static void enqueue_signal(struct task_struct *t, enum pid_type type, + int sig, struct sigqueue *q) +{ + struct signal_struct *signal = t->signal; + struct sigpending *pending = (type != PIDTYPE_PID) ? + &signal->shared_pending : &t->pending; + + signalfd_notify(t, sig); + if (q) { + list_add_tail(&q->list, &pending->list); + } + sigaddset(&pending->signal, sig); + complete_signal(sig, t, type); +} + static inline bool legacy_queue(struct sigpending *signals, int sig) { return (sig < SIGRTMIN) && sigismember(&signals->signal, sig); @@ -1085,7 +1100,6 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info, q = sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit); if (q) { - list_add_tail(&q->list, &pending->list); switch ((unsigned long) info) { case (unsigned long) SEND_SIG_NOINFO: clear_siginfo(&q->info); @@ -1131,9 +1145,6 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info, } out_set: - signalfd_notify(t, sig); - sigaddset(&pending->signal, sig); - /* Let multiprocess signals appear after on-going forks */ if (type > PIDTYPE_TGID) { struct multiprocess_signals *delayed; @@ -1148,7 +1159,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info, } } - complete_signal(sig, t, type); + enqueue_signal(t, type, sig, q); ret: trace_signal_generate(sig, info, t, type != PIDTYPE_PID, result); return ret; @@ -1939,18 +1950,6 @@ bool posixtimer_init_sigqueue(struct sigqueue *q) return true; } -static void posixtimer_queue_sigqueue(struct sigqueue *q, struct task_struct *t, enum pid_type type) -{ - struct sigpending *pending; - int sig = q->info.si_signo; - - signalfd_notify(t, sig); - pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; - list_add_tail(&q->list, &pending->list); - sigaddset(&pending->signal, sig); - complete_signal(sig, t, type); -} - /* * This function is used by POSIX timers to deliver a timer signal. * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID @@ -2076,7 +2075,7 @@ void posixtimer_send_sigqueue(struct k_itimer *tmr) else hlist_del_init(&tmr->ignored_list); - posixtimer_queue_sigqueue(q, t, tmr->it_pid_type); + enqueue_signal(t, tmr->it_pid_type, sig, q); result = TRACE_SIGNAL_DELIVERED; out: trace_signal_generate(sig, &q->info, t, tmr->it_pid_type != PIDTYPE_PID, result); @@ -2137,7 +2136,7 @@ static void posixtimer_sig_unignore(struct task_struct *tsk, int sig) guard(rcu)(); target = posixtimer_get_target(tmr); if (target) - posixtimer_queue_sigqueue(&tmr->sigq, target, tmr->it_pid_type); + enqueue_signal(target, tmr->it_pid_type, sig, &tmr->sigq); else posixtimer_putref(tmr); } -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 02/14] signal: Factor out sig_blocked from sig_ignored 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman 2026-07-03 21:36 ` [PATCH 01/14] signal: Generalize posixtimer_queue_sigqueue into enqueue_signal Eric W. Biederman @ 2026-07-03 21:37 ` Eric W. Biederman 2026-07-03 21:37 ` [PATCH 03/14] signal: More accurate ignoring of signals based on sig_can_short_circuit Eric W. Biederman ` (12 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:37 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner Add a helper that spells out the logic of why both blocked and real_blocked need to be consulted to see if a signal is blocked. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- kernel/signal.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/kernel/signal.c b/kernel/signal.c index 6b49bae3fce7..1a8183606dc0 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -103,6 +103,25 @@ static bool sig_task_ignored(struct task_struct *t, int sig, bool force) return sig_handler_ignored(handler, sig); } +static bool sig_blocked(struct task_struct *t, int sig) +{ + /* + * Is calling the signal handler blocked? + * + * Two sigsets need to be consulted to see if the userspace + * signal handler can be invoked: thread->blocked and + * thread->real_blocked. Ordinarily thread->blocked contains + * all of the information and thread->real_blocked is empty. + * + * When thread->real_blocked is in use it contains the actual + * information on which signal handlers can not be invoked and + * thread->blocked has a subset of the signals contained in + * thread->real_blocked. + */ + return sigismember(&t->blocked, sig) || + sigismember(&t->real_blocked, sig); +} + static bool sig_ignored(struct task_struct *t, int sig, bool force) { /* @@ -110,7 +129,7 @@ static bool sig_ignored(struct task_struct *t, int sig, bool force) * signal handler may change by the time it is * unblocked. */ - if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig)) + if (sig_blocked(t, sig)) return false; /* -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 03/14] signal: More accurate ignoring of signals based on sig_can_short_circuit 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman 2026-07-03 21:36 ` [PATCH 01/14] signal: Generalize posixtimer_queue_sigqueue into enqueue_signal Eric W. Biederman 2026-07-03 21:37 ` [PATCH 02/14] signal: Factor out sig_blocked from sig_ignored Eric W. Biederman @ 2026-07-03 21:37 ` Eric W. Biederman 2026-07-03 21:38 ` [PATCH 04/14] signal: Use sig_can_short_circuit to improve fatal signal delivery Eric W. Biederman ` (11 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:37 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner For a signal to be ignored two things need to happen: - The conditions need to be present to ignore calling the signal handler. - The signal needs to be deliverable to at least one thread of the process In rare cases the like unblocked signals on a dead thread the current code will ignore signals that are blocked by all living threads. Opportunities to ignore signals are missed when another thread has the signal unblocked. Implement sig_can_short_circuit to properly detect that short circuiting is possible. Rename sig_task_ignored to sig_ingored. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/signal.c | 59 ++++++++++++++++++++++++++++++++++++------------- 1 file changed, 44 insertions(+), 15 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 1a8183606dc0..4429d3ec6776 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -81,7 +81,7 @@ static inline bool sig_handler_ignored(void __user *handler, int sig) (handler == SIG_DFL && sig_kernel_ignore(sig)); } -static bool sig_task_ignored(struct task_struct *t, int sig, bool force) +static bool sig_ignored(struct task_struct *t, int sig, bool force) { void __user *handler; @@ -122,25 +122,49 @@ static bool sig_blocked(struct task_struct *t, int sig) sigismember(&t->real_blocked, sig); } -static bool sig_ignored(struct task_struct *t, int sig, bool force) +static bool sig_can_short_circuit_to_thread(struct task_struct *thread, int sig) { + /* Only a living thread can receive a short circuit signal */ + if (__fatal_signal_pending(thread) || (thread->flags & PF_EXITING)) + return false; + /* - * Blocked signals are never ignored, since the - * signal handler may change by the time it is - * unblocked. + * If the signal handler is blocked then short circuit + * delivery may not happen because the signal handler may + * change by the time it is unblocked. */ - if (sig_blocked(t, sig)) + if (sig_blocked(thread, sig)) return false; /* - * Tracers may want to know about even ignored signal unless it - * is SIGKILL which can't be reported anyway but can be ignored - * by SIGNAL_UNKILLABLE task. + * Tracers are allowed to see and modify all signals. + * SIGKILL and the SA_IMMUTABLE signals are an exception. */ - if (t->ptrace && sig != SIGKILL) + if (thread->ptrace && + (sig != SIGKILL) && + !(thread->sighand->action[sig - 1].sa.sa_flags & SA_IMMUTABLE)) return false; - return sig_task_ignored(t, sig, force); + return true; +} + +static bool sig_can_short_circuit(struct task_struct *p, enum pid_type type, int sig) +{ + /* + * Is there at least one thread where the short circuit + * delivery is valid? + */ + struct task_struct *thread; + + if (type == PIDTYPE_PID) + return sig_can_short_circuit_to_thread(p, sig); + + for_each_thread(p, thread) { + if (sig_can_short_circuit_to_thread(thread, sig)) + return true; + } + + return false; } /* @@ -887,7 +911,8 @@ static void ptrace_trap_notify(struct task_struct *t) * Returns true if the signal should be actually delivered, otherwise * it should be dropped. */ -static bool prepare_signal(int sig, struct task_struct *p, bool force) +static bool prepare_signal(int sig, struct task_struct *p, + enum pid_type type, bool force) { struct signal_struct *signal = p->signal; struct task_struct *t; @@ -951,7 +976,11 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) } } - return !sig_ignored(p, sig, force); + /* Stop process the signal if nothing more needs to be done */ + if (sig_ignored(p, sig, force) && sig_can_short_circuit(p, type, sig)) + return false; + + return true; } /* @@ -1082,7 +1111,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info, lockdep_assert_held(&t->sighand->siglock); result = TRACE_SIGNAL_IGNORED; - if (!prepare_signal(sig, t, force)) + if (!prepare_signal(sig, t, type, force)) goto ret; pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; @@ -2020,7 +2049,7 @@ void posixtimer_send_sigqueue(struct k_itimer *tmr) */ tmr->it_sig_periodic = tmr->it_status == POSIX_TIMER_REQUEUE_PENDING; - if (!prepare_signal(sig, t, false)) { + if (!prepare_signal(sig, t, tmr->it_pid_type, false)) { result = TRACE_SIGNAL_IGNORED; if (!list_empty(&q->list)) { -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 04/14] signal: Use sig_can_short_circuit to improve fatal signal delivery 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (2 preceding siblings ...) 2026-07-03 21:37 ` [PATCH 03/14] signal: More accurate ignoring of signals based on sig_can_short_circuit Eric W. Biederman @ 2026-07-03 21:38 ` Eric W. Biederman 2026-07-03 21:39 ` [PATCH 05/14] signal: Compute the exit_code in get_signal Eric W. Biederman ` (10 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:38 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner Today send_signal does not discover all of the fatal signals that qualify for short circuit delivery. In general this is not a problem as get_signal will handle any signals that reach it properly. Recognizing fatal signals in send_signal is necessary for the kernel's fatal_signal_pending test to work. Now that sig_can_short_circuit exists stop using a half-assed version of sig_can_short_circuit in complete_signal based on wants_signal. Instead use sig_can_short_circuit to implement fatal signal short circuit handling in enqueue_signal. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/signal.c | 58 +++++++++++++++++++++++-------------------------- 1 file changed, 27 insertions(+), 31 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 4429d3ec6776..b54669ac8e77 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1045,36 +1045,8 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) } /* - * Found a killable thread. If the signal will be fatal, - * then start taking the whole group down immediately. - */ - if (sig_fatal(p, sig) && !sigismember(&t->real_blocked, sig) && - (sig == SIGKILL || !p->ptrace)) { - /* - * This signal will be fatal to the whole group. - */ - if (!sig_kernel_coredump(sig)) { - /* - * Start a group exit and wake everybody up. - * This way we don't have other threads - * running and doing things after a slower - * thread has the fatal signal pending. - */ - signal->flags = SIGNAL_GROUP_EXIT; - signal->group_exit_code = sig; - signal->group_stop_count = 0; - __for_each_thread(signal, t) { - task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); - sigaddset(&t->pending.signal, SIGKILL); - signal_wake_up(t, 1); - } - return; - } - } - - /* - * The signal is already in the shared-pending queue. - * Tell the chosen thread to wake up and dequeue it. + * Found a killable thread. The signal is already in the + * queue. Tell the chosen thread to wake up and dequeue it. */ signal_wake_up(t, sig == SIGKILL); return; @@ -1086,13 +1058,37 @@ static void enqueue_signal(struct task_struct *t, enum pid_type type, struct signal_struct *signal = t->signal; struct sigpending *pending = (type != PIDTYPE_PID) ? &signal->shared_pending : &t->pending; + bool need_signal_wake_up = true; + + if (sig_fatal(t, sig) && !sig_kernel_coredump(sig) && + sig_can_short_circuit(t, type, sig)) { + struct task_struct *thread; + /* + * This signal will be fatal to the whole group. + * + * Start a group exit and wake everybody up. + * This way we don't have other threads + * running and doing things after a slower + * thread has the fatal signal pending. + */ + signal->flags = SIGNAL_GROUP_EXIT; + signal->group_exit_code = sig; + signal->group_stop_count = 0; + __for_each_thread(signal, thread) { + task_clear_jobctl_pending(thread, JOBCTL_PENDING_MASK); + sigaddset(&thread->pending.signal, SIGKILL); + signal_wake_up(thread, 1); + } + need_signal_wake_up = false; + } signalfd_notify(t, sig); if (q) { list_add_tail(&q->list, &pending->list); } sigaddset(&pending->signal, sig); - complete_signal(sig, t, type); + if (need_signal_wake_up) + complete_signal(sig, t, type); } static inline bool legacy_queue(struct sigpending *signals, int sig) -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 05/14] signal: Compute the exit_code in get_signal 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (3 preceding siblings ...) 2026-07-03 21:38 ` [PATCH 04/14] signal: Use sig_can_short_circuit to improve fatal signal delivery Eric W. Biederman @ 2026-07-03 21:39 ` Eric W. Biederman 2026-07-03 21:39 ` Eric W. Biederman ` (9 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:39 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner Update get_signal so it calls do_group_exit with the correct exit_code. Make the default exit_code 0, so that the special case for threads killed by de_thread falls out naturally. Update do_group_exit to trust the exit_code passed in except when SIGNAL_GROUP_EXIT is set. Moving the computation of exit_code into get_signal makes other cleanups possible. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- kernel/exit.c | 4 ++-- kernel/signal.c | 12 ++++++++---- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 1056422bc101..c8460c215189 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1126,7 +1126,7 @@ do_group_exit(int exit_code) if (sig->flags & SIGNAL_GROUP_EXIT) exit_code = sig->group_exit_code; else if (sig->group_exec_task) - exit_code = 0; + ; else { struct sighand_struct *const sighand = current->sighand; @@ -1135,7 +1135,7 @@ do_group_exit(int exit_code) /* Another thread got here before we took the lock. */ exit_code = sig->group_exit_code; else if (sig->group_exec_task) - exit_code = 0; + ; else { sig->group_exit_code = exit_code; sig->flags = SIGNAL_GROUP_EXIT; diff --git a/kernel/signal.c b/kernel/signal.c index b54669ac8e77..0e9103bda143 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2908,10 +2908,13 @@ bool get_signal(struct ksignal *ksig) for (;;) { struct k_sigaction *ka; enum pid_type type; + int exit_code = 0; /* Has this task already been marked for death? */ if ((signal->flags & SIGNAL_GROUP_EXIT) || signal->group_exec_task) { + if (signal->flags & SIGNAL_GROUP_EXIT) + exit_code = signal->group_exit_code; signr = SIGKILL; sigdelset(¤t->pending.signal, SIGKILL); trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, @@ -3041,14 +3044,15 @@ bool get_signal(struct ksignal *ksig) continue; } + /* + * Anything else is fatal, maybe with a core dump. + */ + exit_code = signr; fatal: spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) cgroup_leave_frozen(true); - /* - * Anything else is fatal, maybe with a core dump. - */ current->flags |= PF_SIGNALED; if (sig_kernel_coredump(signr)) { @@ -3078,7 +3082,7 @@ bool get_signal(struct ksignal *ksig) /* * Death signals, no core dump. */ - do_group_exit(signr); + do_group_exit(exit_code); /* NOTREACHED */ } spin_unlock_irq(&sighand->siglock); -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 05/14] signal: Compute the exit_code in get_signal 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (4 preceding siblings ...) 2026-07-03 21:39 ` [PATCH 05/14] signal: Compute the exit_code in get_signal Eric W. Biederman @ 2026-07-03 21:39 ` Eric W. Biederman 2026-07-03 21:40 ` [PATCH 06/14] signal: In get_signal call do_exit when it is unnecessary to shoot down threads Eric W. Biederman ` (8 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:39 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner Update get_signal so it calls do_group_exit with the correct exit_code. Make the default exit_code 0, so that the special case for threads killed by de_thread falls out naturally. Update do_group_exit to trust the exit_code passed in except when SIGNAL_GROUP_EXIT is set. Moving the computation of exit_code into get_signal makes other cleanups possible. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- kernel/exit.c | 4 ++-- kernel/signal.c | 12 ++++++++---- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 1056422bc101..c8460c215189 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1126,7 +1126,7 @@ do_group_exit(int exit_code) if (sig->flags & SIGNAL_GROUP_EXIT) exit_code = sig->group_exit_code; else if (sig->group_exec_task) - exit_code = 0; + ; else { struct sighand_struct *const sighand = current->sighand; @@ -1135,7 +1135,7 @@ do_group_exit(int exit_code) /* Another thread got here before we took the lock. */ exit_code = sig->group_exit_code; else if (sig->group_exec_task) - exit_code = 0; + ; else { sig->group_exit_code = exit_code; sig->flags = SIGNAL_GROUP_EXIT; diff --git a/kernel/signal.c b/kernel/signal.c index b54669ac8e77..0e9103bda143 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2908,10 +2908,13 @@ bool get_signal(struct ksignal *ksig) for (;;) { struct k_sigaction *ka; enum pid_type type; + int exit_code = 0; /* Has this task already been marked for death? */ if ((signal->flags & SIGNAL_GROUP_EXIT) || signal->group_exec_task) { + if (signal->flags & SIGNAL_GROUP_EXIT) + exit_code = signal->group_exit_code; signr = SIGKILL; sigdelset(¤t->pending.signal, SIGKILL); trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, @@ -3041,14 +3044,15 @@ bool get_signal(struct ksignal *ksig) continue; } + /* + * Anything else is fatal, maybe with a core dump. + */ + exit_code = signr; fatal: spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) cgroup_leave_frozen(true); - /* - * Anything else is fatal, maybe with a core dump. - */ current->flags |= PF_SIGNALED; if (sig_kernel_coredump(signr)) { @@ -3078,7 +3082,7 @@ bool get_signal(struct ksignal *ksig) /* * Death signals, no core dump. */ - do_group_exit(signr); + do_group_exit(exit_code); /* NOTREACHED */ } spin_unlock_irq(&sighand->siglock); -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 06/14] signal: In get_signal call do_exit when it is unnecessary to shoot down threads 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (5 preceding siblings ...) 2026-07-03 21:39 ` Eric W. Biederman @ 2026-07-03 21:40 ` Eric W. Biederman 2026-07-03 21:40 ` [PATCH 07/14] signal: Bring down all threads when handling a non-coredump fatal signal Eric W. Biederman ` (7 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:40 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner In get_signal if other threads of the current process do not need to be shot down calling do_group_exit is equivalent to calling do_exit. The code in get_signal is only responsible for shooting down threads when it dequeues a signal and decides the signal is fatal. To remove special cases and make the code easier to read, call do_exit instead of do_group_exit when no other threads need to be shot down. With do_group_exit no longer being called when exec is terminating threads in de_thread remove the special case in do_group_exit for handling exec. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/exit.c | 4 ---- kernel/signal.c | 7 ++++++- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index c8460c215189..55f03477cb08 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1125,8 +1125,6 @@ do_group_exit(int exit_code) if (sig->flags & SIGNAL_GROUP_EXIT) exit_code = sig->group_exit_code; - else if (sig->group_exec_task) - ; else { struct sighand_struct *const sighand = current->sighand; @@ -1134,8 +1132,6 @@ do_group_exit(int exit_code) if (sig->flags & SIGNAL_GROUP_EXIT) /* Another thread got here before we took the lock. */ exit_code = sig->group_exit_code; - else if (sig->group_exec_task) - ; else { sig->group_exit_code = exit_code; sig->flags = SIGNAL_GROUP_EXIT; diff --git a/kernel/signal.c b/kernel/signal.c index 0e9103bda143..94662010ab4c 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2906,6 +2906,7 @@ bool get_signal(struct ksignal *ksig) } for (;;) { + bool group_exit_needed = false; struct k_sigaction *ka; enum pid_type type; int exit_code = 0; @@ -3048,6 +3049,7 @@ bool get_signal(struct ksignal *ksig) * Anything else is fatal, maybe with a core dump. */ exit_code = signr; + group_exit_needed = true; fatal: spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) @@ -3082,7 +3084,10 @@ bool get_signal(struct ksignal *ksig) /* * Death signals, no core dump. */ - do_group_exit(exit_code); + if (group_exit_needed) + do_group_exit(exit_code); + else + do_exit(exit_code); /* NOTREACHED */ } spin_unlock_irq(&sighand->siglock); -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 07/14] signal: Bring down all threads when handling a non-coredump fatal signal 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (6 preceding siblings ...) 2026-07-03 21:40 ` [PATCH 06/14] signal: In get_signal call do_exit when it is unnecessary to shoot down threads Eric W. Biederman @ 2026-07-03 21:40 ` Eric W. Biederman 2026-07-03 21:41 ` [PATCH 08/14] signal: Move stopping for the coredump from do_exit into get_signal Eric W. Biederman ` (6 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:40 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner For non-coredump fatal signals instead of dropping and reacquiring siglock to shoot down the other threads from do_group_exit at the end of get_signal, shoot down the other threads before siglock is dropped. This can not be done for coredump signals yet, because do_coredump needs to be in a position to catch dying threads before it kills them so it can make certain to catch them, so they can be added to the coredump. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/signal.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/kernel/signal.c b/kernel/signal.c index 94662010ab4c..5ed3a02542ad 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3049,7 +3049,21 @@ bool get_signal(struct ksignal *ksig) * Anything else is fatal, maybe with a core dump. */ exit_code = signr; - group_exit_needed = true; + if (sig_kernel_coredump(signr)) + group_exit_needed = true; + else { + struct task_struct *t; + signal->flags = SIGNAL_GROUP_EXIT; + signal->group_exit_code = signr; + signal->group_stop_count = 0; + __for_each_thread(signal, t) { + task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); + if (t != current) { + sigaddset(&t->pending.signal, SIGKILL); + signal_wake_up(t, 1); + } + } + } fatal: spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 08/14] signal: Move stopping for the coredump from do_exit into get_signal 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (7 preceding siblings ...) 2026-07-03 21:40 ` [PATCH 07/14] signal: Bring down all threads when handling a non-coredump fatal signal Eric W. Biederman @ 2026-07-03 21:41 ` Eric W. Biederman 2026-07-03 21:41 ` [PATCH 09/14] signal: Move audit_core_dumps from do_coredump " Eric W. Biederman ` (5 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:41 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner Stopping to participate in a coredump from a kernel oops makes no sense and is actively dangerous because the kernel is known to be broken. Considering to stop in a coredump from a kernel thread exit is silly because userspace coredumps are not generated from kernel threads. Not stopping for a coredump in exit(2) and exit_group(2) and related userspace exits that call do_exit or do_group_exit directly is the current behavior of the code as the PF_SIGNALED test in coredump_task_exit attests. Since only tasks that pass through get_signal and set PF_SIGNALED can join coredumps move stopping for coredumps into get_signal, where the PF_SIGNALED test is unnecessary. This avoids even the potential of stopping for coredumps in the silly or dangerous places. This can be seen to be safe by examining the few places that call do_exit: - get_signal calling do_group_exit Called by get_signal to terminate the userspace process. As stopping for the coredump happens now happens in get_signal the code will continue to participate in the coredump. - exit_group(2) calling do_group_exit If a thread calls exit_group(2) while another thread in the same process is performing a coredump there is a race. The thread that wins the race will take the lock and set SIGNAL_GROUP_EXIT. If it is the thread that called do_group_exit then zap_threads will return -EAGAIN and no coredump will be generated. If it is the thread that is coredumping that wins the race, the task that called do_group_exit will exit gracefully with an error code before the coredump begins. Having a single thread exit just before the coredump starts is not ideal as the semantics make no sense. (Did the group exit happen before the coredump or did the coredump happen before the group exit?). Eventually I intend for group exits to flow through get_signal and this silliness will no longer be possible. Until then the current behavior when this race occurs is maintained. - io_uring Called after get_signal returns to terminate the I/O worker thread (essentially a userspace thread that only runs kernel code) so that additional cleanup code can be run before do_exit. As get_signal is called the prior to do_exit code will continue to participate in the coredump. - make_task_dead Called on an unhandled kernel or hardware failure. As the failure is unhandled any extra work has the potential to make the failure worse so being part of a coredump is not appropriate. - kthread_exit Called to terminate a kernel thread as such coredumps do not exist. - call_usermodehelper_exec_async Called to terminate a kernel thread if kerenel_execve fails, as it is a kernel thread coredumps do not exist. - reboot, seeccomp For these calls of do_exit() they are semantically direct calls of exit(2) today. As do_exit() does not synchronize with siglock there is no logical race between a coredump killing the thread and these threads exiting. These threads logically exit before the coredump happens. This is also the current behavior so there is nothing to be concerned about with respect to userspsace semantics or regresssions. Moving the coredump stop for userspace threads that did not dequeue the coredumping signal from from do_exit into get_signal in general is safe, because the coredump in the single threaded case completely happens in get_signal. The code movement ensures that a multi-threaded coredump will not have any issues because the additional threads stop after some amount of cleanup has been done. The coredump code is robust to all kinds of userspace changes happening in parallel as multiple processes can share a mm. This makes the it safe to perform the coredump before the io_uring cleanup happens as io_uring can't do anything another process sharing the mm would not be doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/coredump.c | 25 ++++++++++++++++++++++++- include/linux/coredump.h | 2 ++ kernel/exit.c | 35 +++++++---------------------------- kernel/signal.c | 5 +++++ mm/oom_kill.c | 2 +- 5 files changed, 39 insertions(+), 30 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index e68a76ff92a3..2b0b6c3c47ee 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -520,6 +520,29 @@ static int zap_threads(struct task_struct *tsk, return nr; } +void coredump_join(struct core_state *core_state) +{ + /* Stop and join the in-progress coredump */ + struct core_thread self; + + self.task = current; + self.next = xchg(&core_state->dumper.next, &self); + /* + * Implies mb(), the result of xchg() must be visible + * to core_state->dumper. + */ + if (atomic_dec_and_test(&core_state->nr_threads)) + complete(&core_state->startup); + + for (;;) { + set_current_state(TASK_IDLE|TASK_FREEZABLE); + if (!self.task) /* see coredump_finish() */ + break; + schedule(); + } + __set_current_state(TASK_RUNNING); +} + static int coredump_wait(int exit_code, struct core_state *core_state) { struct task_struct *tsk = current; @@ -566,7 +589,7 @@ static void coredump_finish(bool core_dumped) next = curr->next; task = curr->task; /* - * see coredump_task_exit(), curr->task must not see + * see coredump_join(), curr->task must not see * ->task == NULL before we read ->next. */ smp_mb(); diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 7b38ee2e7913..22f46392b4d3 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -47,6 +47,7 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr); extern int dump_align(struct coredump_params *cprm, int align); int dump_user_range(struct coredump_params *cprm, unsigned long start, unsigned long len); +extern void coredump_join(struct core_state *core_state); extern void vfs_coredump(const kernel_siginfo_t *siginfo); /* @@ -67,6 +68,7 @@ extern void vfs_coredump(const kernel_siginfo_t *siginfo); #define coredump_report_failure(fmt, ...) __COREDUMP_PRINTK(KERN_WARNING, fmt, ##__VA_ARGS__) #else +extern inline void coredump_join(struct core_state *core_state) {} static inline void vfs_coredump(const kernel_siginfo_t *siginfo) {} #define coredump_report(...) diff --git a/kernel/exit.c b/kernel/exit.c index 55f03477cb08..120743301a82 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -421,32 +421,6 @@ kill_orphaned_pgrp(struct task_struct *tsk, struct task_struct *parent) } } -static void coredump_task_exit(struct task_struct *tsk, - struct core_state *core_state) -{ - struct core_thread self; - - self.task = tsk; - if (self.task->flags & PF_SIGNALED) - self.next = xchg(&core_state->dumper.next, &self); - else - self.task = NULL; - /* - * Implies mb(), the result of xchg() must be visible - * to core_state->dumper. - */ - if (atomic_dec_and_test(&core_state->nr_threads)) - complete(&core_state->startup); - - for (;;) { - set_current_state(TASK_IDLE|TASK_FREEZABLE); - if (!self.task) /* see coredump_finish() */ - break; - schedule(); - } - __set_current_state(TASK_RUNNING); -} - #ifdef CONFIG_MEMCG /* drops tasklist_lock if succeeds */ static bool __try_to_set_owner(struct task_struct *tsk, struct mm_struct *mm) @@ -917,8 +891,13 @@ static void synchronize_group_exit(struct task_struct *tsk, long code) core_state = signal->core_state; spin_unlock_irq(&sighand->siglock); - if (unlikely(core_state)) - coredump_task_exit(tsk, core_state); + /* + * Decrement ->nr_threads and possibly complete + * core_state->startup to politely skip participating in any + * pending coredumps. + */ + if (unlikely(core_state) && atomic_dec_and_test(&core_state->nr_threads)) + complete(&core_state->startup); } void __noreturn do_exit(long code) diff --git a/kernel/signal.c b/kernel/signal.c index 5ed3a02542ad..dd00e9879dcf 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2907,6 +2907,7 @@ bool get_signal(struct ksignal *ksig) for (;;) { bool group_exit_needed = false; + struct core_state *core_state; struct k_sigaction *ka; enum pid_type type; int exit_code = 0; @@ -3065,6 +3066,7 @@ bool get_signal(struct ksignal *ksig) } } fatal: + core_state = signal->core_state; spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) cgroup_leave_frozen(true); @@ -3084,6 +3086,9 @@ bool get_signal(struct ksignal *ksig) * that value and ignore the one we pass it. */ vfs_coredump(&ksig->info); + } else if (core_state) { + /* Wait for the coredump to happen */ + coredump_join(core_state); } /* diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 5f372f6e26fa..ff9d59963561 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -840,7 +840,7 @@ static inline bool __task_will_free_mem(struct task_struct *task) /* * A coredumping process may sleep for an extended period in - * coredump_task_exit(), so the oom killer cannot assume that + * get_signal(), so the oom killer cannot assume that * the process will promptly exit and release memory. */ if (sig->core_state) -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 09/14] signal: Move audit_core_dumps from do_coredump into get_signal 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (8 preceding siblings ...) 2026-07-03 21:41 ` [PATCH 08/14] signal: Move stopping for the coredump from do_exit into get_signal Eric W. Biederman @ 2026-07-03 21:41 ` Eric W. Biederman 2026-07-03 21:42 ` [PATCH 10/14] coredump: In zap_threads complete startup if there is no need to wait Eric W. Biederman ` (4 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:41 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner The function audit_core_dumps is not about the coredumps but about detecting the conditions that would trigger a coredump, and logging something when that happens. The function audit_core_dumps runs even if a coredump never happens. So move audit_core_dumps out of vfs_coredump and into get_signal to make it clear it does not care about the actual core dumps. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/coredump.c | 2 -- kernel/signal.c | 1 + 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 2b0b6c3c47ee..700814fc2ff6 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -1199,8 +1199,6 @@ void vfs_coredump(const kernel_siginfo_t *siginfo) .cpu = raw_smp_processor_id(), }; - audit_core_dumps(siginfo->si_signo); - if (coredump_skip(&cprm, binfmt)) return; diff --git a/kernel/signal.c b/kernel/signal.c index dd00e9879dcf..28e047d76043 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3077,6 +3077,7 @@ bool get_signal(struct ksignal *ksig) if (print_fatal_signals) print_fatal_signal(signr); proc_coredump_connector(current); + audit_core_dumps(ksig->info.si_signo); /* * If it was able to dump core, this kills all * other threads in the group and synchronizes with -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 10/14] coredump: In zap_threads complete startup if there is no need to wait 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (9 preceding siblings ...) 2026-07-03 21:41 ` [PATCH 09/14] signal: Move audit_core_dumps from do_coredump " Eric W. Biederman @ 2026-07-03 21:42 ` Eric W. Biederman 2026-07-03 21:43 ` [PATCH 11/14] signal: Use the thread killing in get_signal for coredumps Eric W. Biederman ` (3 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:42 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner Remove the need to test the value of core_waiters in coredump_wait by completing core_state->startup when there is an error or there are no other tasks to wait for. This slightly simplifies the logic and prepares for moving zap_threads out of coredump_wait. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/coredump.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 700814fc2ff6..4e0e9407704c 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -516,6 +516,8 @@ static int zap_threads(struct task_struct *tsk, tsk->flags |= PF_DUMPCORE; atomic_set(&core_state->nr_threads, nr); } + if (nr <= 0) + complete(&core_state->startup); spin_unlock_irq(&tsk->sighand->siglock); return nr; } @@ -546,28 +548,26 @@ void coredump_join(struct core_state *core_state) static int coredump_wait(int exit_code, struct core_state *core_state) { struct task_struct *tsk = current; - int core_waiters = -EBUSY; + struct core_thread *ptr; + int core_waiters; init_completion(&core_state->startup); core_state->dumper.task = tsk; core_state->dumper.next = NULL; core_waiters = zap_threads(tsk, core_state, exit_code); - if (core_waiters > 0) { - struct core_thread *ptr; - wait_for_completion_state(&core_state->startup, - TASK_UNINTERRUPTIBLE|TASK_FREEZABLE); - /* - * Wait for all the threads to become inactive, so that - * all the thread context (extended register state, like - * fpu etc) gets copied to the memory. - */ - ptr = core_state->dumper.next; - while (ptr != NULL) { - wait_task_inactive(ptr->task, TASK_ANY); - ptr = ptr->next; - } + wait_for_completion_state(&core_state->startup, + TASK_UNINTERRUPTIBLE|TASK_FREEZABLE); + /* + * Wait for all the threads to become inactive, so that + * all the thread context (extended register state, like + * fpu etc) gets copied to the memory. + */ + ptr = core_state->dumper.next; + while (ptr != NULL) { + wait_task_inactive(ptr->task, TASK_ANY); + ptr = ptr->next; } return core_waiters; -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 11/14] signal: Use the thread killing in get_signal for coredumps 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (10 preceding siblings ...) 2026-07-03 21:42 ` [PATCH 10/14] coredump: In zap_threads complete startup if there is no need to wait Eric W. Biederman @ 2026-07-03 21:43 ` Eric W. Biederman 2026-07-03 21:43 ` [PATCH 12/14] exit: Make do_group_exit static Eric W. Biederman ` (2 subsequent siblings) 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:43 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner Now that coredumps are per process there is no reason for the coredump code to have it's own routine to kill the threads of the process. The coredump code does need to have a routine to catch the threads that will be part of the coredump, and to only catch them if a coredump will be generated. Split out coredump_begin from do_coredump so that the threads of the process can be caught in the coredump. Also move the logic to decide if a coredump should be generated into coredump_begin, with do_coredump now simply returning immediately if coredump_begin has decided not to capture a coredump. Update get_signal to always shoot down the threads of the process, and to call coredump_begin if a coredump needs to be started. Remove the call of do_group_exit in get_signal as it is unnecessary. The practical reason for splitting coredump_begin out from do_coredump is so that I don't have to analyze if cgroup_leave_frozen, print_fatal_signal, proc_coredump_connector and audit_core_dumps can safely be called under siglock. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/coredump.c | 106 +++++++++++++++-------------------- include/linux/coredump.h | 2 + include/linux/sched/signal.h | 1 + kernel/signal.c | 44 ++++++--------- 4 files changed, 64 insertions(+), 89 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 4e0e9407704c..998800f171a4 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -480,46 +480,50 @@ static bool coredump_parse(struct core_name *cn, struct coredump_params *cprm, return true; } -static int zap_process(struct signal_struct *signal, int exit_code) +static inline bool coredump_skip(enum task_dumpable dumpable, + const struct linux_binfmt *binfmt) { + if (!binfmt) + return true; + if (!binfmt->core_dump) + return true; + if (dumpable == TASK_DUMPABLE_OFF) + return true; + return false; +} + +void coredump_begin(struct core_state *core_state) +{ + /* Called with siglock held */ + struct task_struct *tsk = current; + struct signal_struct *signal = tsk->signal; + struct mm_struct *mm = tsk->mm; + struct linux_binfmt * binfmt = mm->binfmt; + /* Snapshot dumpable for the dump */ + enum task_dumpable dumpable = task_exec_state_get_dumpable(tsk); struct task_struct *t; int nr = 0; - signal->flags = SIGNAL_GROUP_EXIT; - signal->group_exit_code = exit_code; - signal->group_stop_count = 0; + if (coredump_skip(dumpable, binfmt)) + return; - __for_each_thread(signal, t) { - task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); - if (t != current && !(t->flags & PF_POSTCOREDUMP)) { - sigaddset(&t->pending.signal, SIGKILL); - signal_wake_up(t, 1); - nr++; - } - } + init_completion(&core_state->startup); + core_state->dumper.task = tsk; + core_state->dumper.next = NULL; + core_state->dumpable = dumpable; - return nr; -} + /* Count how may other threads will participate in the coredump */ + __for_each_thread(signal, t) + nr += (t != tsk) && !(t->flags & PF_POSTCOREDUMP); -static int zap_threads(struct task_struct *tsk, - struct core_state *core_state, int exit_code) -{ - struct signal_struct *signal = tsk->signal; - int nr = -EAGAIN; - - spin_lock_irq(&tsk->sighand->siglock); - if (!(signal->flags & SIGNAL_GROUP_EXIT) && !signal->group_exec_task) { - /* Allow SIGKILL, see prepare_signal() */ - signal->core_state = core_state; - nr = zap_process(signal, exit_code); - clear_tsk_thread_flag(tsk, TIF_SIGPENDING); - tsk->flags |= PF_DUMPCORE; - atomic_set(&core_state->nr_threads, nr); - } - if (nr <= 0) + atomic_set(&core_state->nr_threads, nr); + if (nr == 0) complete(&core_state->startup); - spin_unlock_irq(&tsk->sighand->siglock); - return nr; + + /* Allow SIGKILL, see prepare_signal() */ + signal->core_state = core_state; + clear_tsk_thread_flag(tsk, TIF_SIGPENDING); + tsk->flags |= PF_DUMPCORE; } void coredump_join(struct core_state *core_state) @@ -545,17 +549,9 @@ void coredump_join(struct core_state *core_state) __set_current_state(TASK_RUNNING); } -static int coredump_wait(int exit_code, struct core_state *core_state) +static void coredump_wait(struct core_state *core_state) { - struct task_struct *tsk = current; struct core_thread *ptr; - int core_waiters; - - init_completion(&core_state->startup); - core_state->dumper.task = tsk; - core_state->dumper.next = NULL; - - core_waiters = zap_threads(tsk, core_state, exit_code); wait_for_completion_state(&core_state->startup, TASK_UNINTERRUPTIBLE|TASK_FREEZABLE); @@ -569,8 +565,6 @@ static int coredump_wait(int exit_code, struct core_state *core_state) wait_task_inactive(ptr->task, TASK_ANY); ptr = ptr->next; } - - return core_waiters; } static void coredump_finish(bool core_dumped) @@ -1100,18 +1094,6 @@ static void coredump_cleanup(struct core_name *cn, struct coredump_params *cprm) coredump_finish(cn->core_dumped); } -static inline bool coredump_skip(const struct coredump_params *cprm, - const struct linux_binfmt *binfmt) -{ - if (!binfmt) - return true; - if (!binfmt->core_dump) - return true; - if (cprm->dumpable == TASK_DUMPABLE_OFF) - return true; - return false; -} - static void do_coredump(struct core_name *cn, struct coredump_params *cprm, size_t **argv, int *argc, const struct linux_binfmt *binfmt) { @@ -1184,7 +1166,7 @@ static void do_coredump(struct core_name *cn, struct coredump_params *cprm, void vfs_coredump(const kernel_siginfo_t *siginfo) { size_t *argv __free(kfree) = NULL; - struct core_state core_state; + struct core_state *core_state = current->signal->core_state; struct core_name cn; const struct mm_struct *mm = current->mm; const struct linux_binfmt *binfmt = mm->binfmt; @@ -1192,16 +1174,19 @@ void vfs_coredump(const kernel_siginfo_t *siginfo) struct coredump_params cprm = { .siginfo = siginfo, .limit = rlimit(RLIMIT_CORE), - /* Snapshot MMF_DUMP_FILTER_* (unlocked) and dumpable for the dump. */ + /* Snapshot MMF_DUMP_FILTER_* (unlocked) for the dump */ .mm_flags = __mm_flags_get_word(mm), - .dumpable = task_exec_state_get_dumpable(current), .vma_meta = NULL, .cpu = raw_smp_processor_id(), }; - if (coredump_skip(&cprm, binfmt)) + /* coredump_begin decided not to coredump */ + if (!core_state) return; + /* Copy the snapshot of dumpable into coredump_params */ + cprm.dumpable = core_state->dumpable; + CLASS(prepare_creds, cred)(); if (!cred) return; @@ -1214,8 +1199,7 @@ void vfs_coredump(const kernel_siginfo_t *siginfo) if (coredump_force_suid_safe(&cprm)) cred->fsuid = GLOBAL_ROOT_UID; - if (coredump_wait(siginfo->si_signo, &core_state) < 0) - return; + coredump_wait(core_state); scoped_with_creds(cred) do_coredump(&cn, &cprm, &argv, &argc, binfmt); diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 22f46392b4d3..645ea675dc91 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -47,6 +47,7 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr); extern int dump_align(struct coredump_params *cprm, int align); int dump_user_range(struct coredump_params *cprm, unsigned long start, unsigned long len); +extern void coredump_begin(struct core_state *core_state); extern void coredump_join(struct core_state *core_state); extern void vfs_coredump(const kernel_siginfo_t *siginfo); @@ -68,6 +69,7 @@ extern void vfs_coredump(const kernel_siginfo_t *siginfo); #define coredump_report_failure(fmt, ...) __COREDUMP_PRINTK(KERN_WARNING, fmt, ##__VA_ARGS__) #else +static inline void coredump_begin(struct core_state *core_state) {} extern inline void coredump_join(struct core_state *core_state) {} static inline void vfs_coredump(const kernel_siginfo_t *siginfo) {} diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 584ae88b435e..4ff1da6b841e 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -80,6 +80,7 @@ struct core_thread { struct core_state { atomic_t nr_threads; + enum task_dumpable dumpable; struct core_thread dumper; struct completion startup; }; diff --git a/kernel/signal.c b/kernel/signal.c index 28e047d76043..674d4b6d0b8a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2906,8 +2906,8 @@ bool get_signal(struct ksignal *ksig) } for (;;) { - bool group_exit_needed = false; - struct core_state *core_state; + struct core_state local_core_state, *core_state; + struct task_struct *t; struct k_sigaction *ka; enum pid_type type; int exit_code = 0; @@ -3050,22 +3050,20 @@ bool get_signal(struct ksignal *ksig) * Anything else is fatal, maybe with a core dump. */ exit_code = signr; - if (sig_kernel_coredump(signr)) - group_exit_needed = true; - else { - struct task_struct *t; - signal->flags = SIGNAL_GROUP_EXIT; - signal->group_exit_code = signr; - signal->group_stop_count = 0; - __for_each_thread(signal, t) { - task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); - if (t != current) { - sigaddset(&t->pending.signal, SIGKILL); - signal_wake_up(t, 1); - } + signal->flags = SIGNAL_GROUP_EXIT; + signal->group_exit_code = exit_code; + signal->group_stop_count = 0; + __for_each_thread(signal, t) { + task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); + if (t != current) { + sigaddset(&t->pending.signal, SIGKILL); + signal_wake_up(t, 1); } } fatal: + /* Setup to collect a coredump */ + if (sig_kernel_coredump(signr)) + coredump_begin(&local_core_state); core_state = signal->core_state; spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) @@ -3078,14 +3076,7 @@ bool get_signal(struct ksignal *ksig) print_fatal_signal(signr); proc_coredump_connector(current); audit_core_dumps(ksig->info.si_signo); - /* - * If it was able to dump core, this kills all - * other threads in the group and synchronizes with - * their demise. If we lost the race with another - * thread getting here, it set group_exit_code - * first and our do_group_exit call below will use - * that value and ignore the one we pass it. - */ + /* If dumping write out the coredump */ vfs_coredump(&ksig->info); } else if (core_state) { /* Wait for the coredump to happen */ @@ -3102,12 +3093,9 @@ bool get_signal(struct ksignal *ksig) goto out; /* - * Death signals, no core dump. + * Death signals. */ - if (group_exit_needed) - do_group_exit(exit_code); - else - do_exit(exit_code); + do_exit(exit_code); /* NOTREACHED */ } spin_unlock_irq(&sighand->siglock); -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 12/14] exit: Make do_group_exit static 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (11 preceding siblings ...) 2026-07-03 21:43 ` [PATCH 11/14] signal: Use the thread killing in get_signal for coredumps Eric W. Biederman @ 2026-07-03 21:43 ` Eric W. Biederman 2026-07-03 21:44 ` [PATCH 13/14] signal: Dequeue fatal signals Eric W. Biederman 2026-07-03 21:44 ` [PATCH 14/14] signal: Short circuit deliver coredump signals Eric W. Biederman 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:43 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner Now that do_group_exit only has a single caller in exit.c make it static so this is obvious. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/sched/task.h | 1 - kernel/exit.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 41ed884cffc9..8b1a85a54999 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -90,7 +90,6 @@ static inline void exit_thread(struct task_struct *tsk) { } #endif -extern __noreturn void do_group_exit(int); extern void exit_files(struct task_struct *); extern void exit_itimers(struct task_struct *); diff --git a/kernel/exit.c b/kernel/exit.c index 120743301a82..57beab2b9485 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1097,7 +1097,7 @@ SYSCALL_DEFINE1(exit, int, error_code) * Take down every thread in the group. This is called by fatal signals * as well as by sys_exit_group (below). */ -void __noreturn +static void __noreturn do_group_exit(int exit_code) { struct signal_struct *sig = current->signal; -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 13/14] signal: Dequeue fatal signals 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (12 preceding siblings ...) 2026-07-03 21:43 ` [PATCH 12/14] exit: Make do_group_exit static Eric W. Biederman @ 2026-07-03 21:44 ` Eric W. Biederman 2026-07-03 21:44 ` [PATCH 14/14] signal: Short circuit deliver coredump signals Eric W. Biederman 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:44 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner Fatal signals are detected early and historically have not been dequeued. This barely matters as the process exits immediately. Not dequeuing the signal is visible to userspace inspecting the dying process through proc and will be to coredumps once we start using short circuit delivery for them. Update the short circuit delivery for fatal signals to always place the fatal signal on the shared signal queue. To keep things simple always populate siginfo in dequeue_exit_signal and always pass the dequeueed siginfo to trace_signal_deliver. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/sched/signal.h | 1 + kernel/signal.c | 43 +++++++++++++++++++++++++++++------- 2 files changed, 36 insertions(+), 8 deletions(-) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 4ff1da6b841e..17a5930cf98a 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -262,6 +262,7 @@ struct signal_struct { #define SIGNAL_STOP_STOPPED 0x00000001 /* job control stop in effect */ #define SIGNAL_STOP_CONTINUED 0x00000002 /* SIGCONT since WCONTINUED reap */ #define SIGNAL_GROUP_EXIT 0x00000004 /* group exit in progress */ +#define SIGNAL_EXIT_DEQUEUE 0x00000008 /* Dequeue the exit signal */ /* * Pending notifications to parent. */ diff --git a/kernel/signal.c b/kernel/signal.c index 674d4b6d0b8a..5bd07a90c689 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -708,6 +708,36 @@ int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type) } EXPORT_SYMBOL_GPL(dequeue_signal); +static int dequeue_exit_signal( + struct task_struct *tsk, int exit_code, kernel_siginfo_t *info) +{ + struct signal_struct *signal = tsk->signal; + + /* The default siginfo to return */ + clear_siginfo(info); + info->si_signo = SIGKILL; + info->si_code = SI_KERNEL; + + if (signal->flags & SIGNAL_EXIT_DEQUEUE) { + int sig = exit_code; + signal->flags &= ~SIGNAL_EXIT_DEQUEUE; + + struct sigqueue *q = + list_last_entry_or_null(&signal->shared_pending.list, + struct sigqueue, list); + if (q && (q->info.si_signo == sig)) { + list_del_init(&q->list); + copy_siginfo(info, &q->info); + __sigqueue_free(q); + } else { + info->si_signo = sig; + } + sigdelset(&signal->shared_pending.signal, sig); + } + + return info->si_signo; +} + static int dequeue_synchronous_signal(kernel_siginfo_t *info) { struct task_struct *tsk = current; @@ -1071,7 +1101,8 @@ static void enqueue_signal(struct task_struct *t, enum pid_type type, * running and doing things after a slower * thread has the fatal signal pending. */ - signal->flags = SIGNAL_GROUP_EXIT; + pending = &signal->shared_pending; + signal->flags = SIGNAL_GROUP_EXIT | SIGNAL_EXIT_DEQUEUE; signal->group_exit_code = sig; signal->group_stop_count = 0; __for_each_thread(signal, thread) { @@ -2917,15 +2948,11 @@ bool get_signal(struct ksignal *ksig) signal->group_exec_task) { if (signal->flags & SIGNAL_GROUP_EXIT) exit_code = signal->group_exit_code; - signr = SIGKILL; sigdelset(¤t->pending.signal, SIGKILL); - trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO, - &sighand->action[SIGKILL-1]); + signr = dequeue_exit_signal(current, exit_code, &ksig->info); + trace_signal_deliver(signr, &ksig->info, + &sighand->action[signr-1]); recalc_sigpending(); - /* - * implies do_group_exit() or return to PF_USER_WORKER, - * no need to initialize ksig->info/etc. - */ goto fatal; } -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 14/14] signal: Short circuit deliver coredump signals 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman ` (13 preceding siblings ...) 2026-07-03 21:44 ` [PATCH 13/14] signal: Dequeue fatal signals Eric W. Biederman @ 2026-07-03 21:44 ` Eric W. Biederman 14 siblings, 0 replies; 36+ messages in thread From: Eric W. Biederman @ 2026-07-03 21:44 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Andy Lutomirski, Kees Cook, Kusaram Devineni, Peter Zijlstra, Thomas Gleixner, Will Drewry, linux-kernel, Linus Torvalds, Christian Brauner The coredump rendevous start is the same as the process killing that complete_signal performs. get_signal now gets the siginfo and the signal number when a signal is short circuit delivered. Start short circuit deliverying coredump signals as there is nothing remaining that prevents their short circuit delivery. This means that processes that coredump will now exit faster and fatal_signal_pending will return true until the coredump starts. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/signal.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 5bd07a90c689..f41233e32c62 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1090,8 +1090,7 @@ static void enqueue_signal(struct task_struct *t, enum pid_type type, &signal->shared_pending : &t->pending; bool need_signal_wake_up = true; - if (sig_fatal(t, sig) && !sig_kernel_coredump(sig) && - sig_can_short_circuit(t, type, sig)) { + if (sig_fatal(t, sig) && sig_can_short_circuit(t, type, sig)) { struct task_struct *thread; /* * This signal will be fatal to the whole group. -- 2.41.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
end of thread, other threads:[~2026-07-03 21:44 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-19 13:27 [PATCH v2 1/3] signal: change force_sig_info_to_task() to call __send_signal_locked() Oleg Nesterov 2026-06-19 13:27 ` [PATCH v2 2/3] signal: turn the "bool force" arg of __send_signal_locked() into "int flags" Oleg Nesterov 2026-06-19 13:28 ` [PATCH v2 3/3] signal: fix evasion of SA_IMMUTABLE signals Oleg Nesterov 2026-06-26 16:52 ` [PATCH 0/11] Short circuit delivery for coredump signals Eric W. Biederman 2026-06-26 16:54 ` [PATCH 01/11] signal: Compute the exit_code in get_signal Eric W. Biederman 2026-06-26 16:54 ` [PATCH 02/11] signal: In get_signal call do_exit when it is unnecessary to shoot down threads Eric W. Biederman 2026-06-26 16:55 ` [PATCH 03/11] signal: Bring down all threads when handling a non-coredump fatal signal Eric W. Biederman 2026-06-26 16:55 ` [PATCH 04/11] signal: Move stopping for the coredump from do_exit into get_signal Eric W. Biederman 2026-06-26 16:56 ` [PATCH 05/11] signal: Move audit_core_dumps from do_coredump " Eric W. Biederman 2026-06-26 16:57 ` [PATCH 06/11] coredump: In zap_threads complete startup if there is no need to wait Eric W. Biederman 2026-06-26 16:57 ` [PATCH 07/11] signal: Use the thread killing in get_signal for coredumps Eric W. Biederman 2026-06-26 16:58 ` [PATCH 08/11] exit: Make do_group_exit static Eric W. Biederman 2026-06-26 16:59 ` [PATCH 09/11] signal: Dequeue fatal signals Eric W. Biederman 2026-06-26 16:59 ` [PATCH 10/11] signal: Short circuit deliver coredump signals Eric W. Biederman 2026-06-26 17:00 ` [PATCH 11/11] signal: Remove SA_IMMUTABLE Eric W. Biederman 2026-06-28 14:29 ` [PATCH 0/11] Short circuit delivery for coredump signals Oleg Nesterov 2026-06-29 6:22 ` Eric W. Biederman 2026-06-29 17:45 ` Eric W. Biederman 2026-07-02 10:36 ` Oleg Nesterov 2026-07-03 20:16 ` Eric W. Biederman 2026-07-03 21:35 ` [PATCH v2 00/14] " Eric W. Biederman 2026-07-03 21:36 ` [PATCH 01/14] signal: Generalize posixtimer_queue_sigqueue into enqueue_signal Eric W. Biederman 2026-07-03 21:37 ` [PATCH 02/14] signal: Factor out sig_blocked from sig_ignored Eric W. Biederman 2026-07-03 21:37 ` [PATCH 03/14] signal: More accurate ignoring of signals based on sig_can_short_circuit Eric W. Biederman 2026-07-03 21:38 ` [PATCH 04/14] signal: Use sig_can_short_circuit to improve fatal signal delivery Eric W. Biederman 2026-07-03 21:39 ` [PATCH 05/14] signal: Compute the exit_code in get_signal Eric W. Biederman 2026-07-03 21:39 ` Eric W. Biederman 2026-07-03 21:40 ` [PATCH 06/14] signal: In get_signal call do_exit when it is unnecessary to shoot down threads Eric W. Biederman 2026-07-03 21:40 ` [PATCH 07/14] signal: Bring down all threads when handling a non-coredump fatal signal Eric W. Biederman 2026-07-03 21:41 ` [PATCH 08/14] signal: Move stopping for the coredump from do_exit into get_signal Eric W. Biederman 2026-07-03 21:41 ` [PATCH 09/14] signal: Move audit_core_dumps from do_coredump " Eric W. Biederman 2026-07-03 21:42 ` [PATCH 10/14] coredump: In zap_threads complete startup if there is no need to wait Eric W. Biederman 2026-07-03 21:43 ` [PATCH 11/14] signal: Use the thread killing in get_signal for coredumps Eric W. Biederman 2026-07-03 21:43 ` [PATCH 12/14] exit: Make do_group_exit static Eric W. Biederman 2026-07-03 21:44 ` [PATCH 13/14] signal: Dequeue fatal signals Eric W. Biederman 2026-07-03 21:44 ` [PATCH 14/14] signal: Short circuit deliver coredump signals Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox