From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63667433E75 for ; Fri, 3 Jul 2026 21:43:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=166.70.13.233 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783114991; cv=none; b=h2ODdLNCZ1N3QCcJv1dzI2DaW72Q+1MYftozE/SCKUJKuvQDy2bkfG61ULi75ZiwXyRrcxlDaVqgiqWUKGOcUDAG91ltuKcpPze+l8UZmftfMa4wa18kCkPUUbwiVFtBvI0OMN6FMy8+RoAS1rwxTZdD8e0k6wR0LXyJxmWI7wo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783114991; c=relaxed/simple; bh=YWs8tFfoOqhlmnrBd1989XMJ1zcEZ2d6w8wFTDOrIjM=; h=From:To:Cc:In-Reply-To:References:Date:Message-ID:MIME-Version: Content-Type:Subject; b=nC/wJObDEBKTQbreYVQhB9VhzKJzQg6xItwq9iYiV3hxLOseCckYodKZBPUUoFJFA4vsJ+Ua+6G6Z32Jcah/+zoQAg6MkGqEA1+SXvvxDQr/ofq9JxZPurdL0qdzjh28UgeLp6h7eVrZYEtwI4hwh87tcgKWEdO9M9bNZMlvmqc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=xmission.com; spf=pass smtp.mailfrom=xmission.com; dkim=pass (1024-bit key) header.d=xmission.com header.i=@xmission.com header.b=bjgAncxa; arc=none smtp.client-ip=166.70.13.233 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=xmission.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=xmission.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=xmission.com header.i=@xmission.com header.b="bjgAncxa" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=simple/simple; d=xmission.com; s=xmission; h=Subject:Content-Type:MIME-Version:Message-ID:Date:References: In-Reply-To:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=YWs8tFfoOqhlmnrBd1989XMJ1zcEZ2d6w8wFTDOrIjM=; b=bjgAncxavNk81022PsLvzC8l+p /0FHTe0bpJGuiZCyZPtH563trjpLgvObhBVRDpZGb4ttWO+VVdkyO3PAfTwB/L206/YRXi/6FWm+r GlldZuZIHji8+bCyr+JBmpwfGS6fv6U0NzuMOqQilJRyH/l/UeyysoRucYF2+KZFEH0w=; Received: from in02.mta.xmission.com ([166.70.13.52]:58756) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1wflfB-00HZXg-J2; Fri, 03 Jul 2026 15:43:09 -0600 Received: from ip72-198-198-28.om.om.cox.net ([72.198.198.28]:48890 helo=email.froward.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1wflfA-000f7o-0z; Fri, 03 Jul 2026 15:43:09 -0600 From: "Eric W. Biederman" To: Oleg Nesterov Cc: Andrew Morton , Andy Lutomirski , Kees Cook , Kusaram Devineni , Peter Zijlstra , Thomas Gleixner , Will Drewry , linux-kernel@vger.kernel.org, Linus Torvalds , Christian Brauner In-Reply-To: <877bnb4uyw.fsf_-_@email.froward.int.ebiederm.org> (Eric W. Biederman's message of "Fri, 03 Jul 2026 16:35:51 -0500") References: <87o6gx9rc4.fsf@email.froward.int.ebiederm.org> <877bnh7tnf.fsf@email.froward.int.ebiederm.org> <877bnb4uyw.fsf_-_@email.froward.int.ebiederm.org> Date: Fri, 03 Jul 2026 16:43:01 -0500 Message-ID: <87a4s721i2.fsf_-_@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1wflfA-000f7o-0z;;;mid=<87a4s721i2.fsf_-_@email.froward.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=72.198.198.28;;;frm=ebiederm@xmission.com;;;sPfnum=0;;;sPf=pass X-XM-AID: U2FsdGVkX1+ah2a76xgGM8VC83QDCsYywk6k5HJdiz0= X-Spam-Level: * X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.1 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.7 XMSubLong Long Subject * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa04 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa04 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: *;Oleg Nesterov X-Spam-Relay-Country: X-Spam-Timing: total 1097 ms - load_scoreonly_sql: 0.06 (0.0%), signal_user_changed: 14 (1.2%), b_tie_ro: 12 (1.1%), parse: 2.5 (0.2%), extract_message_metadata: 24 (2.2%), get_uri_detail_list: 8 (0.8%), tests_pri_-2000: 20 (1.8%), tests_pri_-1000: 3.6 (0.3%), tests_pri_-950: 1.55 (0.1%), tests_pri_-900: 1.30 (0.1%), tests_pri_-90: 133 (12.2%), check_bayes: 130 (11.9%), b_tokenize: 31 (2.8%), b_tok_get_all: 14 (1.3%), b_comp_prob: 4.5 (0.4%), b_tok_touch_all: 77 (7.0%), b_finish: 1.18 (0.1%), tests_pri_0: 872 (79.5%), check_dkim_signature: 1.04 (0.1%), check_dkim_adsp: 4.4 (0.4%), poll_dns_idle: 0.80 (0.1%), tests_pri_10: 2.0 (0.2%), tests_pri_500: 18 (1.6%), rewrite_mail: 0.00 (0.0%) Subject: [PATCH 11/14] signal: Use the thread killing in get_signal for coredumps X-SA-Exim-Connect-IP: 166.70.13.52 X-SA-Exim-Rcpt-To: brauner@kernel.org, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, wad@chromium.org, tglx@kernel.org, peterz@infradead.org, kusaram@devineni.in, kees@kernel.org, luto@kernel.org, akpm@linux-foundation.org, oleg@redhat.com X-SA-Exim-Mail-From: ebiederm@xmission.com X-SA-Exim-Scanned: No (on out03.mta.xmission.com); SAEximRunCond expanded to false Now that coredumps are per process there is no reason for the coredump code to have it's own routine to kill the threads of the process. The coredump code does need to have a routine to catch the threads that will be part of the coredump, and to only catch them if a coredump will be generated. Split out coredump_begin from do_coredump so that the threads of the process can be caught in the coredump. Also move the logic to decide if a coredump should be generated into coredump_begin, with do_coredump now simply returning immediately if coredump_begin has decided not to capture a coredump. Update get_signal to always shoot down the threads of the process, and to call coredump_begin if a coredump needs to be started. Remove the call of do_group_exit in get_signal as it is unnecessary. The practical reason for splitting coredump_begin out from do_coredump is so that I don't have to analyze if cgroup_leave_frozen, print_fatal_signal, proc_coredump_connector and audit_core_dumps can safely be called under siglock. Signed-off-by: "Eric W. Biederman" --- fs/coredump.c | 106 +++++++++++++++-------------------- include/linux/coredump.h | 2 + include/linux/sched/signal.h | 1 + kernel/signal.c | 44 ++++++--------- 4 files changed, 64 insertions(+), 89 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 4e0e9407704c..998800f171a4 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -480,46 +480,50 @@ static bool coredump_parse(struct core_name *cn, struct coredump_params *cprm, return true; } -static int zap_process(struct signal_struct *signal, int exit_code) +static inline bool coredump_skip(enum task_dumpable dumpable, + const struct linux_binfmt *binfmt) { + if (!binfmt) + return true; + if (!binfmt->core_dump) + return true; + if (dumpable == TASK_DUMPABLE_OFF) + return true; + return false; +} + +void coredump_begin(struct core_state *core_state) +{ + /* Called with siglock held */ + struct task_struct *tsk = current; + struct signal_struct *signal = tsk->signal; + struct mm_struct *mm = tsk->mm; + struct linux_binfmt * binfmt = mm->binfmt; + /* Snapshot dumpable for the dump */ + enum task_dumpable dumpable = task_exec_state_get_dumpable(tsk); struct task_struct *t; int nr = 0; - signal->flags = SIGNAL_GROUP_EXIT; - signal->group_exit_code = exit_code; - signal->group_stop_count = 0; + if (coredump_skip(dumpable, binfmt)) + return; - __for_each_thread(signal, t) { - task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); - if (t != current && !(t->flags & PF_POSTCOREDUMP)) { - sigaddset(&t->pending.signal, SIGKILL); - signal_wake_up(t, 1); - nr++; - } - } + init_completion(&core_state->startup); + core_state->dumper.task = tsk; + core_state->dumper.next = NULL; + core_state->dumpable = dumpable; - return nr; -} + /* Count how may other threads will participate in the coredump */ + __for_each_thread(signal, t) + nr += (t != tsk) && !(t->flags & PF_POSTCOREDUMP); -static int zap_threads(struct task_struct *tsk, - struct core_state *core_state, int exit_code) -{ - struct signal_struct *signal = tsk->signal; - int nr = -EAGAIN; - - spin_lock_irq(&tsk->sighand->siglock); - if (!(signal->flags & SIGNAL_GROUP_EXIT) && !signal->group_exec_task) { - /* Allow SIGKILL, see prepare_signal() */ - signal->core_state = core_state; - nr = zap_process(signal, exit_code); - clear_tsk_thread_flag(tsk, TIF_SIGPENDING); - tsk->flags |= PF_DUMPCORE; - atomic_set(&core_state->nr_threads, nr); - } - if (nr <= 0) + atomic_set(&core_state->nr_threads, nr); + if (nr == 0) complete(&core_state->startup); - spin_unlock_irq(&tsk->sighand->siglock); - return nr; + + /* Allow SIGKILL, see prepare_signal() */ + signal->core_state = core_state; + clear_tsk_thread_flag(tsk, TIF_SIGPENDING); + tsk->flags |= PF_DUMPCORE; } void coredump_join(struct core_state *core_state) @@ -545,17 +549,9 @@ void coredump_join(struct core_state *core_state) __set_current_state(TASK_RUNNING); } -static int coredump_wait(int exit_code, struct core_state *core_state) +static void coredump_wait(struct core_state *core_state) { - struct task_struct *tsk = current; struct core_thread *ptr; - int core_waiters; - - init_completion(&core_state->startup); - core_state->dumper.task = tsk; - core_state->dumper.next = NULL; - - core_waiters = zap_threads(tsk, core_state, exit_code); wait_for_completion_state(&core_state->startup, TASK_UNINTERRUPTIBLE|TASK_FREEZABLE); @@ -569,8 +565,6 @@ static int coredump_wait(int exit_code, struct core_state *core_state) wait_task_inactive(ptr->task, TASK_ANY); ptr = ptr->next; } - - return core_waiters; } static void coredump_finish(bool core_dumped) @@ -1100,18 +1094,6 @@ static void coredump_cleanup(struct core_name *cn, struct coredump_params *cprm) coredump_finish(cn->core_dumped); } -static inline bool coredump_skip(const struct coredump_params *cprm, - const struct linux_binfmt *binfmt) -{ - if (!binfmt) - return true; - if (!binfmt->core_dump) - return true; - if (cprm->dumpable == TASK_DUMPABLE_OFF) - return true; - return false; -} - static void do_coredump(struct core_name *cn, struct coredump_params *cprm, size_t **argv, int *argc, const struct linux_binfmt *binfmt) { @@ -1184,7 +1166,7 @@ static void do_coredump(struct core_name *cn, struct coredump_params *cprm, void vfs_coredump(const kernel_siginfo_t *siginfo) { size_t *argv __free(kfree) = NULL; - struct core_state core_state; + struct core_state *core_state = current->signal->core_state; struct core_name cn; const struct mm_struct *mm = current->mm; const struct linux_binfmt *binfmt = mm->binfmt; @@ -1192,16 +1174,19 @@ void vfs_coredump(const kernel_siginfo_t *siginfo) struct coredump_params cprm = { .siginfo = siginfo, .limit = rlimit(RLIMIT_CORE), - /* Snapshot MMF_DUMP_FILTER_* (unlocked) and dumpable for the dump. */ + /* Snapshot MMF_DUMP_FILTER_* (unlocked) for the dump */ .mm_flags = __mm_flags_get_word(mm), - .dumpable = task_exec_state_get_dumpable(current), .vma_meta = NULL, .cpu = raw_smp_processor_id(), }; - if (coredump_skip(&cprm, binfmt)) + /* coredump_begin decided not to coredump */ + if (!core_state) return; + /* Copy the snapshot of dumpable into coredump_params */ + cprm.dumpable = core_state->dumpable; + CLASS(prepare_creds, cred)(); if (!cred) return; @@ -1214,8 +1199,7 @@ void vfs_coredump(const kernel_siginfo_t *siginfo) if (coredump_force_suid_safe(&cprm)) cred->fsuid = GLOBAL_ROOT_UID; - if (coredump_wait(siginfo->si_signo, &core_state) < 0) - return; + coredump_wait(core_state); scoped_with_creds(cred) do_coredump(&cn, &cprm, &argv, &argc, binfmt); diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 22f46392b4d3..645ea675dc91 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -47,6 +47,7 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr); extern int dump_align(struct coredump_params *cprm, int align); int dump_user_range(struct coredump_params *cprm, unsigned long start, unsigned long len); +extern void coredump_begin(struct core_state *core_state); extern void coredump_join(struct core_state *core_state); extern void vfs_coredump(const kernel_siginfo_t *siginfo); @@ -68,6 +69,7 @@ extern void vfs_coredump(const kernel_siginfo_t *siginfo); #define coredump_report_failure(fmt, ...) __COREDUMP_PRINTK(KERN_WARNING, fmt, ##__VA_ARGS__) #else +static inline void coredump_begin(struct core_state *core_state) {} extern inline void coredump_join(struct core_state *core_state) {} static inline void vfs_coredump(const kernel_siginfo_t *siginfo) {} diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 584ae88b435e..4ff1da6b841e 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -80,6 +80,7 @@ struct core_thread { struct core_state { atomic_t nr_threads; + enum task_dumpable dumpable; struct core_thread dumper; struct completion startup; }; diff --git a/kernel/signal.c b/kernel/signal.c index 28e047d76043..674d4b6d0b8a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2906,8 +2906,8 @@ bool get_signal(struct ksignal *ksig) } for (;;) { - bool group_exit_needed = false; - struct core_state *core_state; + struct core_state local_core_state, *core_state; + struct task_struct *t; struct k_sigaction *ka; enum pid_type type; int exit_code = 0; @@ -3050,22 +3050,20 @@ bool get_signal(struct ksignal *ksig) * Anything else is fatal, maybe with a core dump. */ exit_code = signr; - if (sig_kernel_coredump(signr)) - group_exit_needed = true; - else { - struct task_struct *t; - signal->flags = SIGNAL_GROUP_EXIT; - signal->group_exit_code = signr; - signal->group_stop_count = 0; - __for_each_thread(signal, t) { - task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); - if (t != current) { - sigaddset(&t->pending.signal, SIGKILL); - signal_wake_up(t, 1); - } + signal->flags = SIGNAL_GROUP_EXIT; + signal->group_exit_code = exit_code; + signal->group_stop_count = 0; + __for_each_thread(signal, t) { + task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); + if (t != current) { + sigaddset(&t->pending.signal, SIGKILL); + signal_wake_up(t, 1); } } fatal: + /* Setup to collect a coredump */ + if (sig_kernel_coredump(signr)) + coredump_begin(&local_core_state); core_state = signal->core_state; spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) @@ -3078,14 +3076,7 @@ bool get_signal(struct ksignal *ksig) print_fatal_signal(signr); proc_coredump_connector(current); audit_core_dumps(ksig->info.si_signo); - /* - * If it was able to dump core, this kills all - * other threads in the group and synchronizes with - * their demise. If we lost the race with another - * thread getting here, it set group_exit_code - * first and our do_group_exit call below will use - * that value and ignore the one we pass it. - */ + /* If dumping write out the coredump */ vfs_coredump(&ksig->info); } else if (core_state) { /* Wait for the coredump to happen */ @@ -3102,12 +3093,9 @@ bool get_signal(struct ksignal *ksig) goto out; /* - * Death signals, no core dump. + * Death signals. */ - if (group_exit_needed) - do_group_exit(exit_code); - else - do_exit(exit_code); + do_exit(exit_code); /* NOTREACHED */ } spin_unlock_irq(&sighand->siglock); -- 2.41.0