public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matt Helsley <matthltc@us.ibm.com>
To: Grzegorz Nosek <root@localdomain.pl>
Cc: Roland McGrath <roland@redhat.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Sukadev Bhattiprolu <sukadev@us.ibm.com>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: Testing lxc 0.6.5 in Fedora 13
Date: Tue, 23 Mar 2010 14:28:34 -0700	[thread overview]
Message-ID: <20100323212834.GH20796@count0.beaverton.ibm.com> (raw)
In-Reply-To: <20100321195044.GA23757@megiteam.pl>

On Sun, Mar 21, 2010 at 08:50:44PM +0100, Grzegorz Nosek wrote:

<snip>

> 2. Weird strace behaviour across pidns boundary
> 
> When strace'ing (with -ff) lxc-start, I get a proper strace for the
> directly spawned process and the container init. However, any processes
> spawned by the container's init are not straced properly (I get two
> empty files, named <foo>.<pid-in-root-ns> and <foo>.2 -- presumably pid
> inside the container). The container also seems to malfunction under
> strace (looks like exec() failing as lxc-ps shows two "init" processes).
> 
> This is quite painful as it prevents strace'ing processes in containers
> even after startup. Here's a snippet of strace'ing a bash (pid 179
> inside, pid 2959 outside) trying to run 'ls'. The shell hangs until I
> kill the strace process.
> 
> pipe([3, 4])                            = 0
> clone(Process 197 attached
> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7859708) = 197
> Process 2999 attached (waiting for parent)
> [pid  2959] setpgid(197, 197)           = 0
> [pid  2959] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> [pid  2959] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
> [pid  2959] close(3)                    = 0
> [pid  2959] close(4)                    = 0
> [pid  2959] rt_sigprocmask(SIG_BLOCK, [CHLD TSTP TTIN TTOU], [CHLD], 8) = 0
> [pid  2959] ioctl(255, TIOCSPGRP, [197]) = 0
> [pid  2959] rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
> [pid  2959] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> [pid  2959] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
> [pid  2959] waitpid(-1, Process 2959 suspended
> ^C <unfinished ...>
> Process 2959 detached
> Process 197 detached
> Process 2999 detached
> 
> 'strace ls' ran completely inside the container works as expected.

I'm suprised strace of ls works across pid namespaces. I've been looking
at strace and it seemed to me that one kernel change and a bunch of strace
changes are needed to make strace'ing in child pid namespaces work. Eric
Biederman's setns() patches also might help.

Can you get a little farther with the kernel fix below?

    Fix incorrect pid namespace used by ptrace during fork/vfork/clone
    
    pid namespaces are not used properly by ptrace in do_fork(). When tracing
    parent != real_parent because parent is the tracing task. Yet the pid in
    the real_parent's namespace is being used in do_fork():
    
    	nr = task_pid_vnr(p); /* uses real_parent's pid namespace */
    	if (clone_flags & CLONE_PARENT_SETTID)
    		put_user(nr, parent_tidptr); /* "real_parent_tidptr" */
    	...
    	tracehook_report_clone_complete(trace, regs,
    					clone_flags, nr, p); /* ptrace broken */
    
    	if (clone_flags & CLONE_VFORK) {
    		freezer_do_not_count();
    		wait_for_completion(&vfork);
    		freezer_count();
    		tracehook_report_vfork_done(p, nr); /* ptrace broken */
    
    In this case re-using the value in nr is wrong.
    
    This bug can be seen by attaching to an already-running task
    in a descendent namespace with strace -f. When the traced task forks
    strace won't attach to the new task properly because it sees the
    incorrect pid. For example, if root is running on two VTs and
    root@VTN# indicates switching to VT N:
    
    root@VT1# ns_exec -cp /bin/bash
    root@VT1# echo $$
    1
    root@VT2# strace -f -e fork,vfork,clone -p <pid of bash>
    Process 14518 attached - interrupt to quit
    root@VT1# /bin/bash
    <stops -- new bash shell does not respond to input>
    root@VT2#
    clone(Process 15 attached ... ) = 15
    Process 15044 attached (waiting for parent)
    Process 14518 suspended
    <no more output>
    <hit ctrl-c>
    root@VT1# echo $$
    15
    
    strace sees the pid of the new process to attach to as 15 when it should
    really be attaching to pid 15044. Interestingly enough, it does also
    attach to 15044 later but since the initial attach failed it does not
    properly resume the traced task.
    (I assume wait() helped here -- it reported 15044 and hence strace is aware
    that 15044 exists -- I haven't read the strace code to confirm this.)
    
    Miscellaneous Notes re: ptrace and pid namespaces (Documentation/* fodder?):
    
    Note that if the tracer detaches and a tracer from a different ancestor
    pid namespace attaches we'll have the wrong pid number again. The only
    way to fix that is to have ptrace hold a reference to a struct pid
    so long as it may be needed for PTRACE_GETEVENTMSG.
    
    The only way it's possible to ptrace a task outside the tracer's pid
    namespace is if the already-tracing task enters a new descendent pid
    namespace:
    
      tracer	     tracer does		 .
         \		=> clone(CLONE_NEWPID) =>	/ \
         tracee				  tracer   tracee
    
    In this case the pids returned by PTRACE_GETEVENTMSG will be 0.
    Since attaching to tasks that aren't in descendent namespaces is
    not possible, this is a very unlikely problem to encounter.
    
    Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
    Cc: Roland McGrath <roland@redhat.com> (MAINTAINERS: ptrace)
    Cc: Oleg Nesterov <oleg@redhat.com> (MAINTAINERS: ptrace)
    Cc: <utrace folks>
    Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> (pid ns)
    Cc: containers@lists.linux-foundation.org (pid ns)
    Cc: linux-kernel@vger.kernel.org

diff --git a/kernel/fork.c b/kernel/fork.c
index 3a65513..7946ea6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1404,6 +1404,7 @@ long do_fork(unsigned long clone_flags,
 	 */
 	if (!IS_ERR(p)) {
 		struct completion vfork;
+		int ptrace_pid_vnr;
 
 		trace_sched_process_fork(current, p);
 
@@ -1439,14 +1440,21 @@ long do_fork(unsigned long clone_flags,
 			wake_up_new_task(p, clone_flags);
 		}
 
+		ptrace_pid_vnr = nr;
+		if (unlikely(p->parent != p->real_parent)) {
+			rcu_read_lock();
+			ptrace_pid_vnr = task_pid_nr_ns(p, p->parent->nsproxy->pid_ns);
+			rcu_read_unlock();
+		}
 		tracehook_report_clone_complete(trace, regs,
-						clone_flags, nr, p);
+						clone_flags,
+						ptrace_pid_vnr, p);
 
 		if (clone_flags & CLONE_VFORK) {
 			freezer_do_not_count();
 			wait_for_completion(&vfork);
 			freezer_count();
-			tracehook_report_vfork_done(p, nr);
+			tracehook_report_vfork_done(p, ptrace_pid_vnr);
 		}
 	} else {
 		nr = PTR_ERR(p);

       reply	other threads:[~2010-03-23 21:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20100321195044.GA23757@megiteam.pl>
2010-03-23 21:28 ` Matt Helsley [this message]
2010-03-24  9:25   ` Testing lxc 0.6.5 in Fedora 13 Greg Kurz
2010-03-25 21:33   ` Grzegorz Nosek
2010-03-26 11:11     ` Oleg Nesterov
2010-03-26 11:32       ` Grzegorz Nosek
2010-03-26 12:00         ` Oleg Nesterov
2010-03-26 12:46           ` Matt Helsley
2010-03-26 13:34             ` Oleg Nesterov
2010-03-26 11:53       ` Matt Helsley
2010-03-26 12:45         ` Grzegorz Nosek
2010-03-26 12:54           ` Matt Helsley
2010-03-26 13:56             ` Oleg Nesterov
2010-03-26 13:47           ` Oleg Nesterov
2010-04-06  3:44             ` Roland McGrath
2010-04-06 13:53               ` Matt Helsley
2010-04-06 14:36                 ` Oleg Nesterov
2010-04-06 15:17                   ` Eric W. Biederman
2010-04-06 15:13                 ` Eric W. Biederman
2010-04-06 15:29                   ` Matt Helsley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100323212834.GH20796@count0.beaverton.ibm.com \
    --to=matthltc@us.ibm.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=roland@redhat.com \
    --cc=root@localdomain.pl \
    --cc=sukadev@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox