From: Oleg Nesterov <oleg@redhat.com>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: mhocko@suse.com, avagin@openvz.org, peterz@infradead.org,
linux-kernel@vger.kernel.org, rppt@linux.vnet.ibm.com,
ebiederm@xmission.com, luto@kernel.org, gorcunov@openvz.org,
akpm@linux-foundation.org, mingo@kernel.org, serge@hallyn.com
Subject: Re: [PATCH] pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
Date: Fri, 12 May 2017 17:40:04 +0200 [thread overview]
Message-ID: <20170512154004.GA16946@redhat.com> (raw)
In-Reply-To: <149459442953.20826.15806664408144117255.stgit@localhost.localdomain>
On 05/12, Kirill Tkhai wrote:
>
> Task from parent pid_ns Child reaper
> copy_process() ..
> alloc_pid() ..
> .. zap_pid_ns_processes()
> .. disable_pid_allocation()
> .. read_lock(&tasklist_lock)
> .. iterate over pids in pid_ns
> .. kill tasks linked to pids
> .. read_unlock(&tasklist_lock)
> write_lock_irq(&tasklist_lock); ..
> attach_pid(p, PIDTYPE_PID); ..
> .. ..
>
> So, just created task p won't receive SIGKILL signal,
> and the pid namespace will be in contradictory state.
> Only manual kill will help there, but does the userspace
> care about this? I suppose, the most users just inject
> a task into a pid namespace and wait a SIGCHLD from it.
OK.
> The patch fixes the problem. It moves disable_pid_allocation()
> into find_child_reaper() where tasklist_lock is held,
This looks unnecessary,
> and this allows to simply check for (pid_ns->nr_hashed & PIDNS_HASH_ADDING)
> in copy_process(). If allocation is disabled, we just
> return -ENOMEM like it's made for such cases in alloc_pid().
Yes, but note that zap_pid_ns_processes() does disable_pid_allocation()
and then takes tasklist_lock to kill the whole namespace. Given that
copy_process() checks PIDNS_HASH_ADDING under write_lock(tasklist) they
can't race; if copy_process() takes this lock first, the new child will
be killed, otherwise copy_process() can't miss the change in ->nr_hashed.
So I think you can safely remove the changes in exit.c and pid_namespace.c.
> @@ -1523,6 +1523,7 @@ static __latent_entropy struct task_struct *copy_process(
> unsigned long tls,
> int node)
> {
> + struct pid_namespace *pid_ns;
> int retval;
> struct task_struct *p;
>
> @@ -1735,8 +1736,9 @@ static __latent_entropy struct task_struct *copy_process(
> if (retval)
> goto bad_fork_cleanup_io;
>
> + pid_ns = p->nsproxy->pid_ns_for_children;
> if (pid != &init_struct_pid) {
> - pid = alloc_pid(p->nsproxy->pid_ns_for_children);
> + pid = alloc_pid(pid_ns);
> if (IS_ERR(pid)) {
> retval = PTR_ERR(pid);
> goto bad_fork_cleanup_thread;
> @@ -1845,10 +1847,11 @@ static __latent_entropy struct task_struct *copy_process(
> */
> recalc_sigpending();
> if (signal_pending(current)) {
> - spin_unlock(¤t->sighand->siglock);
> - write_unlock_irq(&tasklist_lock);
> retval = -ERESTARTNOINTR;
> - goto bad_fork_cancel_cgroup;
> + goto bad_fork_unlock_siglock;
> + } else if (unlikely(!(pid_ns->nr_hashed & PIDNS_HASH_ADDING))) {
> + retval = -ENOMEM;
> + goto bad_fork_unlock_siglock;
I won't insist, feel free to ignore... But I don't really like the fact
you add the new pid_ns var, copy_process() is already huge and complex.
Can't you simply use ns_of_pid(pid_ns)->nr_hashed ? Yes, this will add
a couple of additional insns, but imo readability is more important.
And why "else if"? Imho this looks less readable and a bit confusing
compared to 2 subsequent if()'s.
And probably a helper which checks PIDNS_HASH_ADDING makes some sense,
but we can do this separately.
> -bad_fork_cancel_cgroup:
> +bad_fork_unlock_siglock:
> + spin_unlock(¤t->sighand->siglock);
> + write_unlock_irq(&tasklist_lock);
> cgroup_cancel_fork(p);
OK, agreed. Except the new name doesn't match the code ;)
Oleg.
prev parent reply other threads:[~2017-05-12 15:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-12 13:07 [PATCH] pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes() Kirill Tkhai
2017-05-12 14:26 ` Eric W. Biederman
2017-05-12 14:47 ` Kirill Tkhai
2017-05-12 14:49 ` Eric W. Biederman
2017-05-12 15:17 ` Kirill Tkhai
2017-05-12 15:40 ` Oleg Nesterov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170512154004.GA16946@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=avagin@openvz.org \
--cc=ebiederm@xmission.com \
--cc=gorcunov@openvz.org \
--cc=ktkhai@virtuozzo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rppt@linux.vnet.ibm.com \
--cc=serge@hallyn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox