From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52620C3279B for ; Tue, 10 Jul 2018 16:01:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F0A82208EC for ; Tue, 10 Jul 2018 16:01:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F0A82208EC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=xmission.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934357AbeGJQBX (ORCPT ); Tue, 10 Jul 2018 12:01:23 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:45776 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934062AbeGJQBV (ORCPT ); Tue, 10 Jul 2018 12:01:21 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fcv4q-0006vJ-GE; Tue, 10 Jul 2018 10:01:20 -0600 Received: from [97.119.167.31] (helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fcv4a-0002V4-Sc; Tue, 10 Jul 2018 10:01:20 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: Linus Torvalds , Andrew Morton , References: <87h8l9p7bg.fsf@xmission.com> <20180709104158.GA23796@redhat.com> <87sh4so5jv.fsf@xmission.com> <20180709145726.GA26149@redhat.com> <877em4nxo0.fsf@xmission.com> <87lgakm4ol.fsf@xmission.com> <20180710134639.GA2453@redhat.com> Date: Tue, 10 Jul 2018 11:00:55 -0500 In-Reply-To: <20180710134639.GA2453@redhat.com> (Oleg Nesterov's message of "Tue, 10 Jul 2018 15:46:39 +0200") Message-ID: <878t6jkrm0.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1fcv4a-0002V4-Sc;;;mid=<878t6jkrm0.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/yGH40tB9MY4+yXTQf4/t2kLsLrrbyYsU= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [Bug 200447] infinite loop in fork syscall X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Oleg Nesterov writes: > On 07/09, Linus Torvalds wrote: >> >> But the patch was written for testing and feedback more than anything >> else. Comments? > > see my reply on bugzilla. Can't we add lkml? Done. > Perhaps we can do another change? Not sure it is actually better, but I think > it is always good to discuss the alternatives. > > 1. Once again, we turn "int group" argument into thread/process/pgid enum, and > change kill_pgrp/tty_signal_session_leader to pass "group = pgid", imo this > makes sense in any case. I agree. There are a lot more multi-process signals cases than handled in Linus's patch. I believe this will clean that up. > 2. To simplify, lets suppose we add the new PF_INFORK flag. Yes, this is bad, > we can do better. I think we can simply add "struct hlist_head forking_threads" > into signal_struct, so complete_signal() can just do hlist_for_each_entry() > rather than for_each_thread() + PF_INFORK check. We don't even need a new > member in task_struct. We still need the distinction between multi-process signals and single process signals (which is the hard part). For good performance of signal delivery to multi-threaded tasks we still need a new member in signal_struct. Plus it is a bit more work to update the list or even walk the list than a sequence counter. So I think adding a sequence counter to let us know about multiprocess signals is the local optimum. If we ever need to remove the restart case entirely from fork and queue all new multi-process signals for the newly created task. Going through the list of forking > > 3. copy_process() can simply block/unblock all signals (except KILL/STOP), see > the "patch" below. All signals are effectively blocked for the duration of the fork for the calling task. Where we get into trouble and where we need a fix for correctness is that another thread can dequeue the signal. Blocking signals of the forking task does not change that. I think that reveals another bug in our current logic. For blocked multi-process signals we don't ensure they get delivered to both the parent and the child if the signal logically comes in after the fork. Multi-threaded b0rked ness and blocked signal b0rkenness, plus periodic timers making for take forever for no good reason. This business of ensuring a signal is logically delvered before or after a fork is tricky. Eric > diff --git a/kernel/fork.c b/kernel/fork.c > index 9440d61..652ef09 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1591,6 +1591,23 @@ static __latent_entropy struct task_struct *copy_process( > int retval; > struct task_struct *p; > > + spin_lock_irq(¤t->sighand->siglock); > + recalc_sigpending_tsk(current); > + if (signal_pending(current)) { > + retval = -ERESTARTNOINTR; > + } else { > + // Yes, yes, this is wrong, just to explain the idea. > + // We should not block SIGKILL, we need to clear PF_INFORK > + // and restore ->blocked in error paths, we need helper(s). > + retval = 0; > + current->flags |= PF_INFORK; > + current->saved_sigmask = current->blocked; > + sigfillset(¤t->blocked); > + } > + spin_unlock_irq(¤t->sighand->siglock); > + if (retval) > + return retval; > + > /* > * Don't allow sharing the root directory with processes in a different > * namespace > @@ -1918,8 +1935,14 @@ static __latent_entropy struct task_struct *copy_process( > * A fatal signal pending means that current will exit, so the new > * thread can't slip out of an OOM kill (or normal SIGKILL). > */ > - recalc_sigpending(); > - if (signal_pending(current)) { > + // check signal_pending() before __set_task_blocked() which does > + // recalc_sigpending(). Again, this is just to explain the idea, > + // __set_task_blocked() is not exported, it is suboptimal, we can > + // do better. > + bool eintr = signal_pending(); > + current->flags &= ~PF_INFORK; > + __set_task_blocked(current, ¤t->saved_sigmask); > + if (eintr) { > retval = -ERESTARTNOINTR; > goto bad_fork_cancel_cgroup; > } > diff --git a/kernel/signal.c b/kernel/signal.c > index 8d8a940..3433e66 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -900,6 +900,14 @@ static void complete_signal(int sig, struct task_struct *p, int group) > struct signal_struct *signal = p->signal; > struct task_struct *t; > > + // kill_pgrp/etc. > + if (group == 2) { > + for_each_thread(p, t) { > + if (p->flags & PF_INFORK && !sigismember(&p->saved_sigmask, sig)) > + signal_wake_up(t, 0); > + } > + } > + > /* > * Now find a thread we can wake up to take the signal off the queue. > *