From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932824AbdELPmT (ORCPT ); Fri, 12 May 2017 11:42:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42570 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757352AbdELPkx (ORCPT ); Fri, 12 May 2017 11:40:53 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 7811E7F3F8 Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=oleg@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 7811E7F3F8 Date: Fri, 12 May 2017 17:40:04 +0200 From: Oleg Nesterov To: Kirill Tkhai Cc: mhocko@suse.com, avagin@openvz.org, peterz@infradead.org, linux-kernel@vger.kernel.org, rppt@linux.vnet.ibm.com, ebiederm@xmission.com, luto@kernel.org, gorcunov@openvz.org, akpm@linux-foundation.org, mingo@kernel.org, serge@hallyn.com Subject: Re: [PATCH] pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes() Message-ID: <20170512154004.GA16946@redhat.com> References: <149459442953.20826.15806664408144117255.stgit@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <149459442953.20826.15806664408144117255.stgit@localhost.localdomain> User-Agent: Mutt/1.5.18 (2008-05-17) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Fri, 12 May 2017 15:40:53 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/12, Kirill Tkhai wrote: > > Task from parent pid_ns Child reaper > copy_process() .. > alloc_pid() .. > .. zap_pid_ns_processes() > .. disable_pid_allocation() > .. read_lock(&tasklist_lock) > .. iterate over pids in pid_ns > .. kill tasks linked to pids > .. read_unlock(&tasklist_lock) > write_lock_irq(&tasklist_lock); .. > attach_pid(p, PIDTYPE_PID); .. > .. .. > > So, just created task p won't receive SIGKILL signal, > and the pid namespace will be in contradictory state. > Only manual kill will help there, but does the userspace > care about this? I suppose, the most users just inject > a task into a pid namespace and wait a SIGCHLD from it. OK. > The patch fixes the problem. It moves disable_pid_allocation() > into find_child_reaper() where tasklist_lock is held, This looks unnecessary, > and this allows to simply check for (pid_ns->nr_hashed & PIDNS_HASH_ADDING) > in copy_process(). If allocation is disabled, we just > return -ENOMEM like it's made for such cases in alloc_pid(). Yes, but note that zap_pid_ns_processes() does disable_pid_allocation() and then takes tasklist_lock to kill the whole namespace. Given that copy_process() checks PIDNS_HASH_ADDING under write_lock(tasklist) they can't race; if copy_process() takes this lock first, the new child will be killed, otherwise copy_process() can't miss the change in ->nr_hashed. So I think you can safely remove the changes in exit.c and pid_namespace.c. > @@ -1523,6 +1523,7 @@ static __latent_entropy struct task_struct *copy_process( > unsigned long tls, > int node) > { > + struct pid_namespace *pid_ns; > int retval; > struct task_struct *p; > > @@ -1735,8 +1736,9 @@ static __latent_entropy struct task_struct *copy_process( > if (retval) > goto bad_fork_cleanup_io; > > + pid_ns = p->nsproxy->pid_ns_for_children; > if (pid != &init_struct_pid) { > - pid = alloc_pid(p->nsproxy->pid_ns_for_children); > + pid = alloc_pid(pid_ns); > if (IS_ERR(pid)) { > retval = PTR_ERR(pid); > goto bad_fork_cleanup_thread; > @@ -1845,10 +1847,11 @@ static __latent_entropy struct task_struct *copy_process( > */ > recalc_sigpending(); > if (signal_pending(current)) { > - spin_unlock(¤t->sighand->siglock); > - write_unlock_irq(&tasklist_lock); > retval = -ERESTARTNOINTR; > - goto bad_fork_cancel_cgroup; > + goto bad_fork_unlock_siglock; > + } else if (unlikely(!(pid_ns->nr_hashed & PIDNS_HASH_ADDING))) { > + retval = -ENOMEM; > + goto bad_fork_unlock_siglock; I won't insist, feel free to ignore... But I don't really like the fact you add the new pid_ns var, copy_process() is already huge and complex. Can't you simply use ns_of_pid(pid_ns)->nr_hashed ? Yes, this will add a couple of additional insns, but imo readability is more important. And why "else if"? Imho this looks less readable and a bit confusing compared to 2 subsequent if()'s. And probably a helper which checks PIDNS_HASH_ADDING makes some sense, but we can do this separately. > -bad_fork_cancel_cgroup: > +bad_fork_unlock_siglock: > + spin_unlock(¤t->sighand->siglock); > + write_unlock_irq(&tasklist_lock); > cgroup_cancel_fork(p); OK, agreed. Except the new name doesn't match the code ;) Oleg.