From: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
To: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: Linux Containers <containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>
Subject: Re: [PATCH linux-cr] nsproxy: record ambient namespaces
Date: Tue, 2 Mar 2010 15:25:53 -0600 [thread overview]
Message-ID: <20100302212553.GA16162@hallyn.com> (raw)
In-Reply-To: <4B8D8102.9020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>
>
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >> Applied.
> >>
> >> Serge E. Hallyn wrote:
> >>> The nsproxy restore path recognizes that an objref of 0 for
> >>> ipc or uts ns means don't unshare it. But the checkpoint side
> >>> forgot to write down 0 when the ipc or uts ns isn't unshared!
> >>>
> >>> Fix that.
> >>>
> >>> To test, run a program with a private pidns but shared utsns
> >>> which does
> >>>
> >>> sleep(5);
> >>> sethostname("serge", 6);
> >>>
> >>> checkpoint it, reset your hostname (if you let the program
> >>> complete), then restart the program: without this patch, it
> >>> will not reset your hostname. It should, and with this patch
> >>> it will.
> >>>
> >>> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> >>> ---
> >>> kernel/nsproxy.c | 19 +++++++++++++------
> >>> 1 files changed, 13 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> >>> index 0da0d83..dcb502c 100644
> >>> --- a/kernel/nsproxy.c
> >>> +++ b/kernel/nsproxy.c
> >>> @@ -280,13 +280,20 @@ static int do_checkpoint_ns(struct ckpt_ctx *ctx, struct nsproxy *nsproxy)
> >>> if (!h)
> >>> return -ENOMEM;
> >>> - ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> >>> - if (ret <= 0)
> >>> - goto out;
> >>> + ret = 0;
> >>> + if (nsproxy->uts_ns != ctx->root_nsproxy->uts_ns) {
> >>> + ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> >>> + if (ret <= 0)
> >>> + goto out;
> >>> + }
> >>> h->uts_objref = ret;
> >>> - ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> >>> - if (ret < 0)
> >>> - goto out;
> >>> +
> >>> + ret = 0;
> >>> + if (nsproxy->ipc_ns != ctx->root_nsproxy->ipc_ns) {
> >>> + ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> >>> + if (ret < 0)
> >>> + goto out;
> >>> + }
> >>> h->ipc_objref = ret;
> >>> /* FIXME: for now, only marked visited to pacify leaks */
> >
> > All right, tihs patch was not right. What we should be checking
> > is whether nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns.
> > But I don't want to just send the patch to do that until we discuss
> > whether that is the right thing to do.
> >
> > Let me give a precise definition: I call an 'ambient namespace' a
> > namespace which was not unshared when the container was created.
> > Unfortunately there isn't really a reliable way to tell whether that
> > was the case. Checking container_init->parent may depend upon the
> > container init not having been reparented.
>
> Hmm... yeah, I should have looked at it more carefully -
Me too :)
> My original idea is that someone (e.g. userspace) could zero out,
> e.g., the h->uts_objref, and that way allow a restart to "inherit"
> the uts-ns of the parent.
>
> I didn't not do it at checkpoint because (a) I wanted to allow
> flexibility by letting the user choose later, and (b) as you pointed
> out already, it's hard to figure out this property at checkpoint
> anyway.
>
> Using a leak detection is tricky, because if we are doing full
> container checkpoint, we disallow leaks anyway, and if we are
> doing a subtree, then leaks are allowed.
>
> >
> > So as I see it we can do three things:
> >
> > 1. always unshare any namespace which was not empty at checkpoint.
> > So if the container was not unshared from host, and we checkpoint
> > members of that namespace, then at restart we will restart in an
> > unshared namespace and recreate the objects. That basically means
> > undo the patch I originally sent.
> >
> > That means that if the restarted task does 'hostname' it may end
> > up not affecting the hosts's hostname, even if it was originally
> > started on the host without separate utsns. Maybe that's what we
> > want?
> >
>
> This is the default we have used so far, and I'm quite happy with
> it.
>
> > 2. use the simple 'nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns'
> > test. I think that would be pretty reliable.
> >
> > 3. for each namespace in ctx->root_nsproxy, check whether there are
> > any leaks, and, if so, mark it in the checkpoing image header so that
> > we can give restart a hint that it might not want to unshare those.
>
> If we leave some work to userspace anyway, an alternative to doing
> the accounting in the kernel (remember: this scenario only makes
> sense for non-container checkpoint), is to simply also save the ref
> count of the nsproxy with the nsproxy data (not even each namespace).
> Then user space can figure out if there is a "leak".
>
> Finally, if we do want to allow such a leak (e.g. only the uts-ns
> of the root) in a full container checkpoint, then we will need some
> way (flag ?) to request that when doing the checkpoint.
>
> So for now, I simply revert the patch (unless you object).
Nope. I think it's best.
thanks,
-serge
prev parent reply other threads:[~2010-03-02 21:25 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-25 22:56 [PATCH linux-cr] nsproxy: record ambient namespaces Serge E. Hallyn
[not found] ` <20100225225641.GA9386-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-03-01 19:20 ` Oren Laadan
[not found] ` <4B8C136E.5060704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-02 18:42 ` Serge E. Hallyn
[not found] ` <20100302184253.GA18840-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-03-02 21:20 ` Oren Laadan
[not found] ` <4B8D8102.9020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-02 21:25 ` Serge E. Hallyn [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100302212553.GA16162@hallyn.com \
--to=serge-a9i7lubdfnhqt0dzr+alfa@public.gmane.org \
--cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
--cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.