From: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
To: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: Linux Containers <containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>
Subject: Re: [PATCH linux-cr] nsproxy: record ambient namespaces
Date: Tue, 2 Mar 2010 15:25:53 -0600 [thread overview]
Message-ID: <20100302212553.GA16162@hallyn.com> (raw)
In-Reply-To: <4B8D8102.9020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>
>
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >> Applied.
> >>
> >> Serge E. Hallyn wrote:
> >>> The nsproxy restore path recognizes that an objref of 0 for
> >>> ipc or uts ns means don't unshare it. But the checkpoint side
> >>> forgot to write down 0 when the ipc or uts ns isn't unshared!
> >>>
> >>> Fix that.
> >>>
> >>> To test, run a program with a private pidns but shared utsns
> >>> which does
> >>>
> >>> sleep(5);
> >>> sethostname("serge", 6);
> >>>
> >>> checkpoint it, reset your hostname (if you let the program
> >>> complete), then restart the program: without this patch, it
> >>> will not reset your hostname. It should, and with this patch
> >>> it will.
> >>>
> >>> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> >>> ---
> >>> kernel/nsproxy.c | 19 +++++++++++++------
> >>> 1 files changed, 13 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> >>> index 0da0d83..dcb502c 100644
> >>> --- a/kernel/nsproxy.c
> >>> +++ b/kernel/nsproxy.c
> >>> @@ -280,13 +280,20 @@ static int do_checkpoint_ns(struct ckpt_ctx *ctx, struct nsproxy *nsproxy)
> >>> if (!h)
> >>> return -ENOMEM;
> >>> - ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> >>> - if (ret <= 0)
> >>> - goto out;
> >>> + ret = 0;
> >>> + if (nsproxy->uts_ns != ctx->root_nsproxy->uts_ns) {
> >>> + ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> >>> + if (ret <= 0)
> >>> + goto out;
> >>> + }
> >>> h->uts_objref = ret;
> >>> - ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> >>> - if (ret < 0)
> >>> - goto out;
> >>> +
> >>> + ret = 0;
> >>> + if (nsproxy->ipc_ns != ctx->root_nsproxy->ipc_ns) {
> >>> + ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> >>> + if (ret < 0)
> >>> + goto out;
> >>> + }
> >>> h->ipc_objref = ret;
> >>> /* FIXME: for now, only marked visited to pacify leaks */
> >
> > All right, tihs patch was not right. What we should be checking
> > is whether nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns.
> > But I don't want to just send the patch to do that until we discuss
> > whether that is the right thing to do.
> >
> > Let me give a precise definition: I call an 'ambient namespace' a
> > namespace which was not unshared when the container was created.
> > Unfortunately there isn't really a reliable way to tell whether that
> > was the case. Checking container_init->parent may depend upon the
> > container init not having been reparented.
>
> Hmm... yeah, I should have looked at it more carefully -
Me too :)
> My original idea is that someone (e.g. userspace) could zero out,
> e.g., the h->uts_objref, and that way allow a restart to "inherit"
> the uts-ns of the parent.
>
> I didn't not do it at checkpoint because (a) I wanted to allow
> flexibility by letting the user choose later, and (b) as you pointed
> out already, it's hard to figure out this property at checkpoint
> anyway.
>
> Using a leak detection is tricky, because if we are doing full
> container checkpoint, we disallow leaks anyway, and if we are
> doing a subtree, then leaks are allowed.
>
> >
> > So as I see it we can do three things:
> >
> > 1. always unshare any namespace which was not empty at checkpoint.
> > So if the container was not unshared from host, and we checkpoint
> > members of that namespace, then at restart we will restart in an
> > unshared namespace and recreate the objects. That basically means
> > undo the patch I originally sent.
> >
> > That means that if the restarted task does 'hostname' it may end
> > up not affecting the hosts's hostname, even if it was originally
> > started on the host without separate utsns. Maybe that's what we
> > want?
> >
>
> This is the default we have used so far, and I'm quite happy with
> it.
>
> > 2. use the simple 'nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns'
> > test. I think that would be pretty reliable.
> >
> > 3. for each namespace in ctx->root_nsproxy, check whether there are
> > any leaks, and, if so, mark it in the checkpoing image header so that
> > we can give restart a hint that it might not want to unshare those.
>
> If we leave some work to userspace anyway, an alternative to doing
> the accounting in the kernel (remember: this scenario only makes
> sense for non-container checkpoint), is to simply also save the ref
> count of the nsproxy with the nsproxy data (not even each namespace).
> Then user space can figure out if there is a "leak".
>
> Finally, if we do want to allow such a leak (e.g. only the uts-ns
> of the root) in a full container checkpoint, then we will need some
> way (flag ?) to request that when doing the checkpoint.
>
> So for now, I simply revert the patch (unless you object).
Nope. I think it's best.
thanks,
-serge
prev parent reply other threads:[~2010-03-02 21:25 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-25 22:56 [PATCH linux-cr] nsproxy: record ambient namespaces Serge E. Hallyn
[not found] ` <20100225225641.GA9386-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-03-01 19:20 ` Oren Laadan
[not found] ` <4B8C136E.5060704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-02 18:42 ` Serge E. Hallyn
[not found] ` <20100302184253.GA18840-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-03-02 21:20 ` Oren Laadan
[not found] ` <4B8D8102.9020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-02 21:25 ` Serge E. Hallyn [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100302212553.GA16162@hallyn.com \
--to=serge-a9i7lubdfnhqt0dzr+alfa@public.gmane.org \
--cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
--cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox