Linux Container Development
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
To: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: Linux Containers <containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>
Subject: Re: [PATCH linux-cr] nsproxy: record ambient namespaces
Date: Tue, 2 Mar 2010 15:25:53 -0600	[thread overview]
Message-ID: <20100302212553.GA16162@hallyn.com> (raw)
In-Reply-To: <4B8D8102.9020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >> Applied.
> >>
> >> Serge E. Hallyn wrote:
> >>> The nsproxy restore path recognizes that an objref of 0 for
> >>> ipc or uts ns means don't unshare it.  But the checkpoint side
> >>> forgot to write down 0 when the ipc or uts ns isn't unshared!
> >>>
> >>> Fix that.
> >>>
> >>> To test, run a program with a private pidns but shared utsns
> >>> which does
> >>>
> >>> 	sleep(5);
> >>> 	sethostname("serge", 6);
> >>>
> >>> checkpoint it, reset your hostname (if you let the program
> >>> complete), then restart the program: without this patch, it
> >>> will not reset your hostname.  It should, and with this patch
> >>> it will.
> >>>
> >>> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> >>> ---
> >>> kernel/nsproxy.c |   19 +++++++++++++------
> >>> 1 files changed, 13 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> >>> index 0da0d83..dcb502c 100644
> >>> --- a/kernel/nsproxy.c
> >>> +++ b/kernel/nsproxy.c
> >>> @@ -280,13 +280,20 @@ static int do_checkpoint_ns(struct ckpt_ctx *ctx, struct nsproxy *nsproxy)
> >>> 	if (!h)
> >>> 		return -ENOMEM;
> >>> -	ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> >>> -	if (ret <= 0)
> >>> -		goto out;
> >>> +	ret = 0;
> >>> +	if (nsproxy->uts_ns != ctx->root_nsproxy->uts_ns) {
> >>> +		ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> >>> +		if (ret <= 0)
> >>> +			goto out;
> >>> +	}
> >>> 	h->uts_objref = ret;
> >>> -	ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> >>> -	if (ret < 0)
> >>> -		goto out;
> >>> +
> >>> +	ret = 0;
> >>> +	if (nsproxy->ipc_ns != ctx->root_nsproxy->ipc_ns) {
> >>> +		ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> >>> +		if (ret < 0)
> >>> +			goto out;
> >>> +	}
> >>> 	h->ipc_objref = ret;
> >>> 	/* FIXME: for now, only marked visited to pacify leaks */
> > 
> > All right, tihs patch was not right.  What we should be checking
> > is whether nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns.
> > But I don't want to just send the patch to do that until we discuss
> > whether that is the right thing to do.
> > 
> > Let me give a precise definition:  I call an 'ambient namespace' a
> > namespace which was not unshared when the container was created.
> > Unfortunately there isn't really a reliable way to tell whether that
> > was the case.  Checking container_init->parent may depend upon the
> > container init not having been reparented.
> 
> Hmm... yeah, I should have looked at it more carefully -

Me too :)

> My original idea is that someone (e.g. userspace) could zero out,
> e.g., the h->uts_objref, and that way allow a restart to "inherit"
> the uts-ns of the parent.
> 
> I didn't not do it at checkpoint because (a) I wanted to allow
> flexibility by letting the user choose later, and (b) as you pointed
> out already, it's hard to figure out this property at checkpoint
> anyway.
> 
> Using a leak detection is tricky, because if we are doing full
> container checkpoint, we disallow leaks anyway, and if we are
> doing a subtree, then leaks are allowed.
> 
> > 
> > So as I see it we can do three things:
> > 
> > 1. always unshare any namespace which was not empty at checkpoint.
> > So if the container was not unshared from host, and we checkpoint
> > members of that namespace, then at restart we will restart in an
> > unshared namespace and recreate the objects.  That basically means
> > undo the patch I originally sent.
> > 
> > That means that if the restarted task does 'hostname' it may end
> > up not affecting the hosts's hostname, even if it was originally
> > started on the host without separate utsns.  Maybe that's what we
> > want?
> > 
> 
> This is the default we have used so far, and I'm quite happy with
> it.
> 
> > 2. use the simple 'nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns'
> > test.  I think that would be pretty reliable.
> > 
> > 3. for each namespace in ctx->root_nsproxy, check whether there are
> > any leaks, and, if so, mark it in the checkpoing image header so that
> > we can give restart a hint that it might not want to unshare those.
> 
> If we leave some work to userspace anyway, an alternative to doing
> the accounting in the kernel (remember: this scenario only makes
> sense for non-container checkpoint), is to simply also save the ref
> count of the nsproxy with the nsproxy data (not even each namespace).
> Then user space can figure out if there is a "leak".
> 
> Finally, if we do want to allow such a leak (e.g. only the uts-ns
> of the root) in a full container checkpoint, then we will need some
> way (flag ?) to request that when doing the checkpoint.
> 
> So for now, I simply revert the patch (unless you object).

Nope.  I think it's best.

thanks,
-serge

      parent reply	other threads:[~2010-03-02 21:25 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-25 22:56 [PATCH linux-cr] nsproxy: record ambient namespaces Serge E. Hallyn
     [not found] ` <20100225225641.GA9386-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-03-01 19:20   ` Oren Laadan
     [not found]     ` <4B8C136E.5060704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-02 18:42       ` Serge E. Hallyn
     [not found]         ` <20100302184253.GA18840-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-03-02 21:20           ` Oren Laadan
     [not found]             ` <4B8D8102.9020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-02 21:25               ` Serge E. Hallyn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100302212553.GA16162@hallyn.com \
    --to=serge-a9i7lubdfnhqt0dzr+alfa@public.gmane.org \
    --cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
    --cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox