public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: <mhocko@suse.com>, <avagin@openvz.org>, <peterz@infradead.org>,
	<oleg@redhat.com>, <linux-kernel@vger.kernel.org>,
	<rppt@linux.vnet.ibm.com>, <luto@kernel.org>,
	<gorcunov@openvz.org>, <akpm@linux-foundation.org>,
	<mingo@kernel.org>, <serge@hallyn.com>
Subject: Re: [PATCH] pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
Date: Fri, 12 May 2017 09:26:19 -0500	[thread overview]
Message-ID: <87shka40dw.fsf@xmission.com> (raw)
In-Reply-To: <149459442953.20826.15806664408144117255.stgit@localhost.localdomain> (Kirill Tkhai's message of "Fri, 12 May 2017 16:07:28 +0300")

Kirill Tkhai <ktkhai@virtuozzo.com> writes:

> Imagine we have a pid namespace and a task from its parent's pid_ns,
> which made setns() to the pid namespace. The task is doing fork(),
> while the pid namespace's child reaper is dying. We have the race
> between them:
>
> Task from parent pid_ns             Child reaper
> copy_process()                      ..
>   alloc_pid()                       ..
>   ..                                zap_pid_ns_processes()
>   ..                                  disable_pid_allocation()
>   ..                                  read_lock(&tasklist_lock)
>   ..                                  iterate over pids in pid_ns
>   ..                                    kill tasks linked to pids
>   ..                                  read_unlock(&tasklist_lock)
>   write_lock_irq(&tasklist_lock);   ..
>   attach_pid(p, PIDTYPE_PID);       ..
>   ..                                ..
>
> So, just created task p won't receive SIGKILL signal,
> and the pid namespace will be in contradictory state.
> Only manual kill will help there, but does the userspace
> care about this? I suppose, the most users just inject
> a task into a pid namespace and wait a SIGCHLD from it.
>
> The patch fixes the problem. It moves disable_pid_allocation()
> into find_child_reaper() where tasklist_lock is held,
> and this allows to simply check for (pid_ns->nr_hashed & PIDNS_HASH_ADDING)
> in copy_process(). If allocation is disabled, we just
> return -ENOMEM like it's made for such cases in alloc_pid().

This problem sounds very theoretical has it ever come up in practice?
I am asking to see if this is something we will care enough about to
backport.

Please look at what happens when you call
spin_unlock_irq(&pidmap_lock) under writelock_irq(&tasklist_lock);

Please also look at what happens when pid == &init_pid but
p->nsproxy->pid_ns_for_children happens to be have PIDNS_HASH_ADDING
set.

All of that said I think this is a fix worth fixing.

Eric

> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Mike Rapoport <rppt@linux.vnet.ibm.com>
> CC: Michal Hocko <mhocko@suse.com>
> CC: Andy Lutomirski <luto@kernel.org>
> CC: "Eric W. Biederman" <ebiederm@xmission.com>
> CC: Andrei Vagin <avagin@openvz.org>
> CC: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Serge Hallyn <serge@hallyn.com>
> ---
>  kernel/exit.c          |    2 ++
>  kernel/fork.c          |   15 ++++++++++-----
>  kernel/pid_namespace.c |    3 ---
>  3 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 516acdb0e0ec..9310e69fbc5f 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -586,6 +586,8 @@ static struct task_struct *find_child_reaper(struct task_struct *father)
>  		return reaper;
>  	}
>  
> +	/* Don't allow any more processes into the pid namespace */
> +	disable_pid_allocation(pid_ns);
>  	write_unlock_irq(&tasklist_lock);
>  	if (unlikely(pid_ns == &init_pid_ns)) {
>  		panic("Attempted to kill init! exitcode=0x%08x\n",
> diff --git a/kernel/fork.c b/kernel/fork.c
> index bfd91b180778..dbafabf6c7b1 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1523,6 +1523,7 @@ static __latent_entropy struct task_struct *copy_process(
>  					unsigned long tls,
>  					int node)
>  {
> +	struct pid_namespace *pid_ns;
>  	int retval;
>  	struct task_struct *p;
>  
> @@ -1735,8 +1736,9 @@ static __latent_entropy struct task_struct *copy_process(
>  	if (retval)
>  		goto bad_fork_cleanup_io;
>  
> +	pid_ns = p->nsproxy->pid_ns_for_children;
>  	if (pid != &init_struct_pid) {
> -		pid = alloc_pid(p->nsproxy->pid_ns_for_children);
> +		pid = alloc_pid(pid_ns);
>  		if (IS_ERR(pid)) {
>  			retval = PTR_ERR(pid);
>  			goto bad_fork_cleanup_thread;
> @@ -1845,10 +1847,11 @@ static __latent_entropy struct task_struct *copy_process(
>  	*/
>  	recalc_sigpending();
>  	if (signal_pending(current)) {
> -		spin_unlock(&current->sighand->siglock);
> -		write_unlock_irq(&tasklist_lock);
>  		retval = -ERESTARTNOINTR;
> -		goto bad_fork_cancel_cgroup;
> +		goto bad_fork_unlock_siglock;
> +	} else if (unlikely(!(pid_ns->nr_hashed & PIDNS_HASH_ADDING))) {
> +		retval = -ENOMEM;
> +		goto bad_fork_unlock_siglock;
>  	}
>  
>  	if (likely(p->pid)) {
> @@ -1906,7 +1909,9 @@ static __latent_entropy struct task_struct *copy_process(
>  
>  	return p;
>  
> -bad_fork_cancel_cgroup:
> +bad_fork_unlock_siglock:
> +	spin_unlock(&current->sighand->siglock);
> +	write_unlock_irq(&tasklist_lock);
>  	cgroup_cancel_fork(p);
>  bad_fork_free_pid:
>  	cgroup_threadgroup_change_end(current);
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index d1f3e9f558b8..aedf86a8017e 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -210,9 +210,6 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>  	struct task_struct *task, *me = current;
>  	int init_pids = thread_group_leader(me) ? 1 : 2;
>  
> -	/* Don't allow any more processes into the pid namespace */
> -	disable_pid_allocation(pid_ns);
> -
>  	/*
>  	 * Ignore SIGCHLD causing any terminated children to autoreap.
>  	 * This speeds up the namespace shutdown, plus see the comment

  reply	other threads:[~2017-05-12 14:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-12 13:07 [PATCH] pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes() Kirill Tkhai
2017-05-12 14:26 ` Eric W. Biederman [this message]
2017-05-12 14:47   ` Kirill Tkhai
2017-05-12 14:49     ` Eric W. Biederman
2017-05-12 15:17       ` Kirill Tkhai
2017-05-12 15:40 ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87shka40dw.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@openvz.org \
    --cc=gorcunov@openvz.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox