All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Louis Rilling <louis.rilling@kerlabs.com>,
	Mike Galbraith <efault@gmx.de>,
	Pavel Emelyanov <xemul@parallels.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] pidns: guarantee that the pidns init will be the last pidns process reaped
Date: Wed, 13 Jun 2012 15:17:34 -0700	[thread overview]
Message-ID: <87bokmoo01.fsf@xmission.com> (raw)
In-Reply-To: <20120613145248.f5caab7e.akpm@linux-foundation.org> (Andrew Morton's message of "Wed, 13 Jun 2012 14:52:48 -0700")

Andrew Morton <akpm@linux-foundation.org> writes:

>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -64,7 +64,6 @@ static void exit_mm(struct task_struct * tsk);
>>  static void __unhash_process(struct task_struct *p, bool group_dead)
>>  {
>>  	nr_threads--;
>> -	detach_pid(p, PIDTYPE_PID);
>>  	if (group_dead) {
>>  		detach_pid(p, PIDTYPE_PGID);
>>  		detach_pid(p, PIDTYPE_SID);
>> @@ -72,7 +71,20 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
>>  		list_del_rcu(&p->tasks);
>>  		list_del_init(&p->sibling);
>>  		__this_cpu_dec(process_counts);
>> +		/*
>> +		 * If we are the last child process in a pid namespace to be
>> +		 * reaped, notify the reaper sleeping zap_pid_ns_processes().
>> +		 */
>> +		if (IS_ENABLED(CONFIG_PID_NS)) {
>> +			struct task_struct *parent = p->real_parent;
>> +
>> +			if ((task_active_pid_ns(p)->child_reaper == parent) &&
>> +			    list_empty(&parent->children) &&
>> +			    (parent->flags & PF_EXITING))
>> +				wake_up_process(parent);
>> +		}
>>  	}
>> +	detach_pid(p, PIDTYPE_PID);
>>  	list_del_rcu(&p->thread_group);
>>  }
>>  
>> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
>> index 57bc1fd..41ed867 100644
>> --- a/kernel/pid_namespace.c
>> +++ b/kernel/pid_namespace.c
>> @@ -179,11 +179,31 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>>  	}
>>  	read_unlock(&tasklist_lock);
>>  
>> +	/* Firstly reap the EXIT_ZOMBIE children we may have. */
>>  	do {
>>  		clear_thread_flag(TIF_SIGPENDING);
>>  		rc = sys_wait4(-1, NULL, __WALL, NULL);
>>  	} while (rc != -ECHILD);
>>  
>> +	/*
>> +	 * sys_wait4() above can't reap the TASK_DEAD children.
>> +	 * Make sure they all go away, see __unhash_process().
>> +	 */
>> +	for (;;) {
>> +		bool need_wait = false;
>> +
>> +		read_lock(&tasklist_lock);
>> +		if (!list_empty(&current->children)) {
>> +			__set_current_state(TASK_UNINTERRUPTIBLE);
>> +			need_wait = true;
>> +		}
>> +		read_unlock(&tasklist_lock);
>> +
>> +		if (!need_wait)
>> +			break;
>> +		schedule();
>
> This sleep is terminated by the above wake_up_process(), yes?

Yes.

> But that wake_up_process() only happens if CONFIG_PID_NS.  It's
> unobvious (to me) that we can't get stuck in D state if
> CONFIG_PID_NS=n.  If bug, please fix.  If not bug, please make obvious
> to me ;)

The file pid_namespace.c that holds zap_pid_ns_processes is only
compiled with CONFIG_PID_NS set.  So we can't possibly get stuck with
CONFIG_PID_NS=n.

> <looks at the locking a bit>
>
> <gets distracted>
>
> That tty_kref_put() in __exit_signal() is running with tasklist_lock
> held, yes?  It does a ton of work and calls out to random drivers and
> none of this needs tasklist_lock.  Seems risky.

Interesting.  That tty_kref_put does sound like an area where the
locking can be simplified.  At the same time tty_kref_put does make
sense from exit signal.  As ttys and signals are intimately intertwined.

Thank you for taking the time looking at this and applying this change.

I have to agree that the tasklist_lock is pretty horrible, and that if
we can somehow figure out how to remove it we would be in a much better
situation with lock contention.  Unfortunately that is an alligator and
I am working to drain the swamp.

Eric


  reply	other threads:[~2012-06-13 22:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20120530175745.GA19327@redhat.com>
2012-05-30 18:14 ` [PATCH 0/2] pidns: premature pid_ns_release_proc() fixes Oleg Nesterov
2012-05-30 18:15   ` [PATCH 1/2] pidns: guarantee that the pidns init will be the last pidns process reaped Oleg Nesterov
2012-05-31 10:32     ` Pavel Emelyanov
2012-06-13 21:52     ` Andrew Morton
2012-06-13 22:17       ` Eric W. Biederman [this message]
2012-06-14 17:20         ` Oleg Nesterov
2012-05-30 18:15   ` [PATCH 2/2] pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper Oleg Nesterov
2012-05-31 10:33     ` Pavel Emelyanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bokmoo01.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=efault@gmx.de \
    --cc=gorcunov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=louis.rilling@kerlabs.com \
    --cc=oleg@redhat.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.