From: Oleg Nesterov <oleg@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Aaron Tomlin <atomlin@redhat.com>,
Pavel Emelyanov <xemul@parallels.com>,
Serge Hallyn <serge.hallyn@ubuntu.com>,
Sterling Alexander <stalexan@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] exit: reparent: fix/update the comments in zap_pid_ns_processes()
Date: Tue, 25 Nov 2014 17:57:45 +0100 [thread overview]
Message-ID: <20141125165745.GB28913@redhat.com> (raw)
In-Reply-To: <874mtodzhd.fsf@x220.int.ebiederm.org>
On 11/24, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > The comments in zap_pid_ns_processes() are simply wrong, we need
> > to explain how this code actually works.
> >
> > 1. "Ignore SIGCHLD" looks like optimization but it is not, we also
> > need this for correctness.
> >
> > 2. The comment above sys_wait4() could be more clear.
> >
> > 3. The comment about TASK_DEAD children is outdated. Contrary to
> > what it says we do not need to make sure they all go away.
>
> We absolutely do need to wait until they all go away.
I can easily miss something, and that is why I asked you to review this
change. But if I missed something, perhaps we should update the comments?
This "wait until TASK_DEAD children" loop was added to ensure that
child_reaper can't be reaped before other tasks in this pid namespace.
6347e90091041e "pidns: guarantee that the pidns init will be the last pidns
process reaped".
But this was needed because proc_flush_task(pid == 1) called
kern_unmount(proc_mnt). After 0a01f2cc390e10633 "pidns: Make the pidns
proc mount/umount logic obvious" we can rely on schedule_work() in
free_pid(nr_hashed == 0).
So in any case I think that the current comment is outdated.
> - rusage will be wrong if we don't wait.
I already answered in 0/2, let me quote myself:
Do you mean cstime/cutime/c* accounting?
Firstly it is not clear what makes child_reaper special in _this_ sense, but
this doesn't matter at all.
The auotoreaping/EXIT_DEAD children are not accounted, only wait_task_zombie()
accumulates these counters. (just in case, accounting in __exit_signal() is
another thing).
> - We won't wait for an injected process in a pid namespace,
> or a processes debugged with gdb to be reaped before the pid
> init process exits if we don't wait.
Yes, and I do not see why this is bad, but this is off-topic.
Again, lets discuss this in another thread. This patch doesn't try to
document the desired semantics, it only tries to explain why zap_pid_ns
_has_ to wait until EXIT_DEAD (and in fact EXIT_ZOMBIE) tasks go away.
> The user visible semantics go from weird to completely insane if we
> relax the rule that the init process is the last process in a pid
> namespace.
>
> I don't see anything approaching a good reason for changing the user
> space semantics.
See above.
> > @@ -190,7 +190,11 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> > /* Don't allow any more processes into the pid namespace */
> > disable_pid_allocation(pid_ns);
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> The above line disables injections of new processes in this pid
> namespace.
Yes, sure. See below.
> > - /* Ignore SIGCHLD causing any terminated children to autoreap */
> > + /*
> > + * Ignore SIGCHLD causing any terminated children to autoreap.
> > + * This speeds up the namespace shutdown, plus see the comment
> > + * below.
> > + */
>
> This speeds up the namespace shutdown and ensures that after we
> have waited for all existing zombies there will be no more
> zombies to wait for.
Afaics, no, see below.
> > - * sys_wait4() above can't reap the TASK_DEAD children.
> > - * Make sure they all go away, see free_pid().
> > + * sys_wait4() above can't reap the TASK_DEAD children but we do not
> > + * really care, we could reparent them to the global init.
>
> We do care.
See above. I only meant that nothing bad can happen from the kernel
perspective.
> > + * But this namespace can also have other tasks injected by setns().
> > + * Again, we do not really need to wait until they are all reaped,
>
> We do, and setns does not matter here. Injection was stopped way up above.
I think that setns() does matter.
Yes injection was stopped. But a task T can be already injected before
disable_pid_allocation() was called.
Now it is killed (or it could even exit before). To simplify, suppose it
is already EXIT_ZOMBIE, although this doesn't really matter.
The sys_wait4() loop can't see it, it is not our child.
Now suppose that its parent doesn't do wait() but exits. This means that
the exiting parent will try to reparent T to pid_ns->child_reaper.
child_reaper already sleeps in "wait for nr_hashed == init_pids" loop, it
can do nothing with T.
So we rely on autoreaping (forced by ignored SIGCHLD), and this is the
(technical) reason why child_reaper can't exit: pid_ns->child_reaper should
be valid.
No?
> > + * We rely on ignored SIGCHLD, an injected zombie must be autoreaped
> > + * if reparented.
>
> Your new comment is about 90% wrong.
See above.
Eric. I'd really ask you to take another look. But in any case: thanks for
looking at this.
Oleg.
next prev parent reply other threads:[~2014-11-25 16:57 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-07 20:14 [PATCH 0/4] proc: deuglify task_state() Oleg Nesterov
2014-11-07 20:14 ` [PATCH 1/4] proc: task_state: read cred->group_info outside of task_lock() Oleg Nesterov
2014-11-07 20:14 ` [PATCH 2/4] proc: task_state: deuglify the max_fds calculation Oleg Nesterov
2014-11-07 20:14 ` [PATCH 3/4] proc: task_state: move the main seq_printf() outside of rcu_read_lock() Oleg Nesterov
2014-11-13 18:04 ` Paul E. McKenney
2014-11-07 20:14 ` [PATCH 4/4] proc: task_state: ptrace_parent() doesn't need pid_alive() check Oleg Nesterov
2014-11-10 21:59 ` [PATCH 0/5] exit: reparent zombie fix + cleanups/optimizations Oleg Nesterov
2014-11-10 22:00 ` [PATCH 1/5] sched_show_task: fix unsafe usage of ->real_parent Oleg Nesterov
2014-11-11 10:39 ` Peter Zijlstra
2014-11-10 22:00 ` [PATCH 2/5] exit: reparent: use ->ptrace_entry rather than ->sibling for EXIT_DEAD tasks Oleg Nesterov
2014-11-10 22:00 ` [PATCH 3/5] exit: reparent: cleanup the changing of ->parent Oleg Nesterov
2014-11-10 22:00 ` [PATCH 4/5] exit: reparent: cleanup the usage of reparent_leader() Oleg Nesterov
2014-11-10 22:00 ` [PATCH 5/5] exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent() Oleg Nesterov
2014-11-14 1:37 ` [PATCH 0/5] exit: more cleanups/optimizations Oleg Nesterov
2014-11-14 1:38 ` [PATCH 1/5] exit: wait: cleanup the ptrace_reparented() checks Oleg Nesterov
2014-11-14 1:38 ` [PATCH 2/5] exit: wait: don't use zombie->real_parent Oleg Nesterov
2014-11-14 1:38 ` [PATCH 3/5] exit: wait: drop tasklist_lock before psig->c* accounting Oleg Nesterov
2014-11-14 1:38 ` [PATCH 4/5] exit: release_task: fix the comment about group leader accounting Oleg Nesterov
2014-11-14 1:38 ` [PATCH 5/5] exit: proc: don't try to flush /proc/tgid/task/tgid Oleg Nesterov
2014-11-18 21:29 ` [PATCH 0/6] exit: find_new_reaper() fixes/cleanups Oleg Nesterov
2014-11-18 21:30 ` [PATCH 1/6] exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting Oleg Nesterov
2014-11-18 21:30 ` [PATCH 2/6] exit: reparent: fix the cross-namespace " Oleg Nesterov
2014-11-18 21:30 ` [PATCH 3/6] exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() Oleg Nesterov
2014-11-18 21:30 ` [PATCH 4/6] exit: reparent: document the ->has_child_subreaper checks Oleg Nesterov
2014-11-18 21:30 ` [PATCH 5/6] exit: reparent: introduce find_child_reaper() Oleg Nesterov
2014-11-18 21:30 ` [PATCH 6/6] exit: reparent: introduce find_alive_thread() Oleg Nesterov
2014-11-20 18:34 ` [PATCH 0/3] exit: avoid O(n ** 2) thread-list scan on group-exit if possible Oleg Nesterov
2014-11-20 18:34 ` [PATCH -mm 1/3] exit: reparent: avoid find_new_reaper() if no children Oleg Nesterov
2014-11-20 22:37 ` Andrew Morton
2014-11-21 20:01 ` Oleg Nesterov
2014-11-20 18:34 ` [PATCH -mm 2/3] exit: reparent: call forget_original_parent() under tasklist_lock Oleg Nesterov
2014-11-20 18:34 ` [PATCH -mm 3/3] exit: exit_notify: re-use "dead" list to autoreap current Oleg Nesterov
2014-11-24 20:06 ` [PATCH 0/2] exit/pid_ns: comments + simple fix Oleg Nesterov
2014-11-24 20:06 ` [PATCH 1/2] exit: reparent: fix/update the comments in zap_pid_ns_processes() Oleg Nesterov
2014-11-24 20:14 ` Oleg Nesterov
2014-11-24 22:07 ` Eric W. Biederman
2014-11-25 16:57 ` Oleg Nesterov [this message]
2014-11-25 17:17 ` Oleg Nesterov
2014-11-24 20:06 ` [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting Oleg Nesterov
2014-11-24 21:46 ` Eric W. Biederman
2014-11-25 17:07 ` Oleg Nesterov
2014-11-25 17:50 ` Eric W. Biederman
2014-11-25 18:15 ` Oleg Nesterov
2014-11-25 18:43 ` Eric W. Biederman
2014-11-25 18:59 ` Oleg Nesterov
2014-11-24 21:27 ` [PATCH 0/2] exit/pid_ns: comments + simple fix Eric W. Biederman
2014-11-24 21:38 ` Oleg Nesterov
2014-11-24 21:48 ` Eric W. Biederman
2014-11-25 16:57 ` Oleg Nesterov
2014-11-26 23:54 ` [PATCH v2 " Oleg Nesterov
2014-11-26 23:54 ` [PATCH v2 1/2] exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting Oleg Nesterov
2014-11-27 15:44 ` Eric W. Biederman
2014-11-26 23:54 ` [PATCH v2 2/2] exit: pidns: fix/update the comments in zap_pid_ns_processes() Oleg Nesterov
2014-12-01 22:39 ` Andrew Morton
2014-12-01 23:24 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141125165745.GB28913@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=atomlin@redhat.com \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=serge.hallyn@ubuntu.com \
--cc=stalexan@redhat.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.