All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH linux-cr] nsproxy: record ambient namespaces
@ 2010-02-25 22:56 Serge E. Hallyn
       [not found] ` <20100225225641.GA9386-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Serge E. Hallyn @ 2010-02-25 22:56 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

The nsproxy restore path recognizes that an objref of 0 for
ipc or uts ns means don't unshare it.  But the checkpoint side
forgot to write down 0 when the ipc or uts ns isn't unshared!

Fix that.

To test, run a program with a private pidns but shared utsns
which does

	sleep(5);
	sethostname("serge", 6);

checkpoint it, reset your hostname (if you let the program
complete), then restart the program: without this patch, it
will not reset your hostname.  It should, and with this patch
it will.

Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 kernel/nsproxy.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 0da0d83..dcb502c 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -280,13 +280,20 @@ static int do_checkpoint_ns(struct ckpt_ctx *ctx, struct nsproxy *nsproxy)
 	if (!h)
 		return -ENOMEM;
 
-	ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
-	if (ret <= 0)
-		goto out;
+	ret = 0;
+	if (nsproxy->uts_ns != ctx->root_nsproxy->uts_ns) {
+		ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
+		if (ret <= 0)
+			goto out;
+	}
 	h->uts_objref = ret;
-	ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
-	if (ret < 0)
-		goto out;
+
+	ret = 0;
+	if (nsproxy->ipc_ns != ctx->root_nsproxy->ipc_ns) {
+		ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
+		if (ret < 0)
+			goto out;
+	}
 	h->ipc_objref = ret;
 
 	/* FIXME: for now, only marked visited to pacify leaks */
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH linux-cr] nsproxy: record ambient namespaces
       [not found] ` <20100225225641.GA9386-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-03-01 19:20   ` Oren Laadan
       [not found]     ` <4B8C136E.5060704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Oren Laadan @ 2010-03-01 19:20 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers


Applied.

Serge E. Hallyn wrote:
> The nsproxy restore path recognizes that an objref of 0 for
> ipc or uts ns means don't unshare it.  But the checkpoint side
> forgot to write down 0 when the ipc or uts ns isn't unshared!
> 
> Fix that.
> 
> To test, run a program with a private pidns but shared utsns
> which does
> 
> 	sleep(5);
> 	sethostname("serge", 6);
> 
> checkpoint it, reset your hostname (if you let the program
> complete), then restart the program: without this patch, it
> will not reset your hostname.  It should, and with this patch
> it will.
> 
> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
>  kernel/nsproxy.c |   19 +++++++++++++------
>  1 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index 0da0d83..dcb502c 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -280,13 +280,20 @@ static int do_checkpoint_ns(struct ckpt_ctx *ctx, struct nsproxy *nsproxy)
>  	if (!h)
>  		return -ENOMEM;
>  
> -	ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> -	if (ret <= 0)
> -		goto out;
> +	ret = 0;
> +	if (nsproxy->uts_ns != ctx->root_nsproxy->uts_ns) {
> +		ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> +		if (ret <= 0)
> +			goto out;
> +	}
>  	h->uts_objref = ret;
> -	ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> -	if (ret < 0)
> -		goto out;
> +
> +	ret = 0;
> +	if (nsproxy->ipc_ns != ctx->root_nsproxy->ipc_ns) {
> +		ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> +		if (ret < 0)
> +			goto out;
> +	}
>  	h->ipc_objref = ret;
>  
>  	/* FIXME: for now, only marked visited to pacify leaks */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH linux-cr] nsproxy: record ambient namespaces
       [not found]     ` <4B8C136E.5060704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-03-02 18:42       ` Serge E. Hallyn
       [not found]         ` <20100302184253.GA18840-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Serge E. Hallyn @ 2010-03-02 18:42 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> Applied.
> 
> Serge E. Hallyn wrote:
> >The nsproxy restore path recognizes that an objref of 0 for
> >ipc or uts ns means don't unshare it.  But the checkpoint side
> >forgot to write down 0 when the ipc or uts ns isn't unshared!
> >
> >Fix that.
> >
> >To test, run a program with a private pidns but shared utsns
> >which does
> >
> >	sleep(5);
> >	sethostname("serge", 6);
> >
> >checkpoint it, reset your hostname (if you let the program
> >complete), then restart the program: without this patch, it
> >will not reset your hostname.  It should, and with this patch
> >it will.
> >
> >Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> >---
> > kernel/nsproxy.c |   19 +++++++++++++------
> > 1 files changed, 13 insertions(+), 6 deletions(-)
> >
> >diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> >index 0da0d83..dcb502c 100644
> >--- a/kernel/nsproxy.c
> >+++ b/kernel/nsproxy.c
> >@@ -280,13 +280,20 @@ static int do_checkpoint_ns(struct ckpt_ctx *ctx, struct nsproxy *nsproxy)
> > 	if (!h)
> > 		return -ENOMEM;
> >-	ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> >-	if (ret <= 0)
> >-		goto out;
> >+	ret = 0;
> >+	if (nsproxy->uts_ns != ctx->root_nsproxy->uts_ns) {
> >+		ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> >+		if (ret <= 0)
> >+			goto out;
> >+	}
> > 	h->uts_objref = ret;
> >-	ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> >-	if (ret < 0)
> >-		goto out;
> >+
> >+	ret = 0;
> >+	if (nsproxy->ipc_ns != ctx->root_nsproxy->ipc_ns) {
> >+		ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> >+		if (ret < 0)
> >+			goto out;
> >+	}
> > 	h->ipc_objref = ret;
> > 	/* FIXME: for now, only marked visited to pacify leaks */

All right, tihs patch was not right.  What we should be checking
is whether nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns.
But I don't want to just send the patch to do that until we discuss
whether that is the right thing to do.

Let me give a precise definition:  I call an 'ambient namespace' a
namespace which was not unshared when the container was created.
Unfortunately there isn't really a reliable way to tell whether that
was the case.  Checking container_init->parent may depend upon the
container init not having been reparented.

So as I see it we can do three things:

1. always unshare any namespace which was not empty at checkpoint.
So if the container was not unshared from host, and we checkpoint
members of that namespace, then at restart we will restart in an
unshared namespace and recreate the objects.  That basically means
undo the patch I originally sent.

That means that if the restarted task does 'hostname' it may end
up not affecting the hosts's hostname, even if it was originally
started on the host without separate utsns.  Maybe that's what we
want?

2. use the simple 'nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns'
test.  I think that would be pretty reliable.

3. for each namespace in ctx->root_nsproxy, check whether there are
any leaks, and, if so, mark it in the checkpoing image header so that
we can give restart a hint that it might not want to unshare those.

-serge

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH linux-cr] nsproxy: record ambient namespaces
       [not found]         ` <20100302184253.GA18840-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-03-02 21:20           ` Oren Laadan
       [not found]             ` <4B8D8102.9020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Oren Laadan @ 2010-03-02 21:20 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>> Applied.
>>
>> Serge E. Hallyn wrote:
>>> The nsproxy restore path recognizes that an objref of 0 for
>>> ipc or uts ns means don't unshare it.  But the checkpoint side
>>> forgot to write down 0 when the ipc or uts ns isn't unshared!
>>>
>>> Fix that.
>>>
>>> To test, run a program with a private pidns but shared utsns
>>> which does
>>>
>>> 	sleep(5);
>>> 	sethostname("serge", 6);
>>>
>>> checkpoint it, reset your hostname (if you let the program
>>> complete), then restart the program: without this patch, it
>>> will not reset your hostname.  It should, and with this patch
>>> it will.
>>>
>>> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>>> ---
>>> kernel/nsproxy.c |   19 +++++++++++++------
>>> 1 files changed, 13 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
>>> index 0da0d83..dcb502c 100644
>>> --- a/kernel/nsproxy.c
>>> +++ b/kernel/nsproxy.c
>>> @@ -280,13 +280,20 @@ static int do_checkpoint_ns(struct ckpt_ctx *ctx, struct nsproxy *nsproxy)
>>> 	if (!h)
>>> 		return -ENOMEM;
>>> -	ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
>>> -	if (ret <= 0)
>>> -		goto out;
>>> +	ret = 0;
>>> +	if (nsproxy->uts_ns != ctx->root_nsproxy->uts_ns) {
>>> +		ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
>>> +		if (ret <= 0)
>>> +			goto out;
>>> +	}
>>> 	h->uts_objref = ret;
>>> -	ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
>>> -	if (ret < 0)
>>> -		goto out;
>>> +
>>> +	ret = 0;
>>> +	if (nsproxy->ipc_ns != ctx->root_nsproxy->ipc_ns) {
>>> +		ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
>>> +		if (ret < 0)
>>> +			goto out;
>>> +	}
>>> 	h->ipc_objref = ret;
>>> 	/* FIXME: for now, only marked visited to pacify leaks */
> 
> All right, tihs patch was not right.  What we should be checking
> is whether nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns.
> But I don't want to just send the patch to do that until we discuss
> whether that is the right thing to do.
> 
> Let me give a precise definition:  I call an 'ambient namespace' a
> namespace which was not unshared when the container was created.
> Unfortunately there isn't really a reliable way to tell whether that
> was the case.  Checking container_init->parent may depend upon the
> container init not having been reparented.

Hmm... yeah, I should have looked at it more carefully -

My original idea is that someone (e.g. userspace) could zero out,
e.g., the h->uts_objref, and that way allow a restart to "inherit"
the uts-ns of the parent.

I didn't not do it at checkpoint because (a) I wanted to allow
flexibility by letting the user choose later, and (b) as you pointed
out already, it's hard to figure out this property at checkpoint
anyway.

Using a leak detection is tricky, because if we are doing full
container checkpoint, we disallow leaks anyway, and if we are
doing a subtree, then leaks are allowed.

> 
> So as I see it we can do three things:
> 
> 1. always unshare any namespace which was not empty at checkpoint.
> So if the container was not unshared from host, and we checkpoint
> members of that namespace, then at restart we will restart in an
> unshared namespace and recreate the objects.  That basically means
> undo the patch I originally sent.
> 
> That means that if the restarted task does 'hostname' it may end
> up not affecting the hosts's hostname, even if it was originally
> started on the host without separate utsns.  Maybe that's what we
> want?
> 

This is the default we have used so far, and I'm quite happy with
it.

> 2. use the simple 'nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns'
> test.  I think that would be pretty reliable.
> 
> 3. for each namespace in ctx->root_nsproxy, check whether there are
> any leaks, and, if so, mark it in the checkpoing image header so that
> we can give restart a hint that it might not want to unshare those.

If we leave some work to userspace anyway, an alternative to doing
the accounting in the kernel (remember: this scenario only makes
sense for non-container checkpoint), is to simply also save the ref
count of the nsproxy with the nsproxy data (not even each namespace).
Then user space can figure out if there is a "leak".

Finally, if we do want to allow such a leak (e.g. only the uts-ns
of the root) in a full container checkpoint, then we will need some
way (flag ?) to request that when doing the checkpoint.

So for now, I simply revert the patch (unless you object).

Oren.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH linux-cr] nsproxy: record ambient namespaces
       [not found]             ` <4B8D8102.9020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-03-02 21:25               ` Serge E. Hallyn
  0 siblings, 0 replies; 5+ messages in thread
From: Serge E. Hallyn @ 2010-03-02 21:25 UTC (permalink / raw)
  To: Oren Laadan; +Cc: Linux Containers

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >> Applied.
> >>
> >> Serge E. Hallyn wrote:
> >>> The nsproxy restore path recognizes that an objref of 0 for
> >>> ipc or uts ns means don't unshare it.  But the checkpoint side
> >>> forgot to write down 0 when the ipc or uts ns isn't unshared!
> >>>
> >>> Fix that.
> >>>
> >>> To test, run a program with a private pidns but shared utsns
> >>> which does
> >>>
> >>> 	sleep(5);
> >>> 	sethostname("serge", 6);
> >>>
> >>> checkpoint it, reset your hostname (if you let the program
> >>> complete), then restart the program: without this patch, it
> >>> will not reset your hostname.  It should, and with this patch
> >>> it will.
> >>>
> >>> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> >>> ---
> >>> kernel/nsproxy.c |   19 +++++++++++++------
> >>> 1 files changed, 13 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> >>> index 0da0d83..dcb502c 100644
> >>> --- a/kernel/nsproxy.c
> >>> +++ b/kernel/nsproxy.c
> >>> @@ -280,13 +280,20 @@ static int do_checkpoint_ns(struct ckpt_ctx *ctx, struct nsproxy *nsproxy)
> >>> 	if (!h)
> >>> 		return -ENOMEM;
> >>> -	ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> >>> -	if (ret <= 0)
> >>> -		goto out;
> >>> +	ret = 0;
> >>> +	if (nsproxy->uts_ns != ctx->root_nsproxy->uts_ns) {
> >>> +		ret = checkpoint_obj(ctx, nsproxy->uts_ns, CKPT_OBJ_UTS_NS);
> >>> +		if (ret <= 0)
> >>> +			goto out;
> >>> +	}
> >>> 	h->uts_objref = ret;
> >>> -	ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> >>> -	if (ret < 0)
> >>> -		goto out;
> >>> +
> >>> +	ret = 0;
> >>> +	if (nsproxy->ipc_ns != ctx->root_nsproxy->ipc_ns) {
> >>> +		ret = checkpoint_obj(ctx, nsproxy->ipc_ns, CKPT_OBJ_IPC_NS);
> >>> +		if (ret < 0)
> >>> +			goto out;
> >>> +	}
> >>> 	h->ipc_objref = ret;
> >>> 	/* FIXME: for now, only marked visited to pacify leaks */
> > 
> > All right, tihs patch was not right.  What we should be checking
> > is whether nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns.
> > But I don't want to just send the patch to do that until we discuss
> > whether that is the right thing to do.
> > 
> > Let me give a precise definition:  I call an 'ambient namespace' a
> > namespace which was not unshared when the container was created.
> > Unfortunately there isn't really a reliable way to tell whether that
> > was the case.  Checking container_init->parent may depend upon the
> > container init not having been reparented.
> 
> Hmm... yeah, I should have looked at it more carefully -

Me too :)

> My original idea is that someone (e.g. userspace) could zero out,
> e.g., the h->uts_objref, and that way allow a restart to "inherit"
> the uts-ns of the parent.
> 
> I didn't not do it at checkpoint because (a) I wanted to allow
> flexibility by letting the user choose later, and (b) as you pointed
> out already, it's hard to figure out this property at checkpoint
> anyway.
> 
> Using a leak detection is tricky, because if we are doing full
> container checkpoint, we disallow leaks anyway, and if we are
> doing a subtree, then leaks are allowed.
> 
> > 
> > So as I see it we can do three things:
> > 
> > 1. always unshare any namespace which was not empty at checkpoint.
> > So if the container was not unshared from host, and we checkpoint
> > members of that namespace, then at restart we will restart in an
> > unshared namespace and recreate the objects.  That basically means
> > undo the patch I originally sent.
> > 
> > That means that if the restarted task does 'hostname' it may end
> > up not affecting the hosts's hostname, even if it was originally
> > started on the host without separate utsns.  Maybe that's what we
> > want?
> > 
> 
> This is the default we have used so far, and I'm quite happy with
> it.
> 
> > 2. use the simple 'nsproxy->uts_ns != ctx->root_task->parent->nsproxy->uts_ns'
> > test.  I think that would be pretty reliable.
> > 
> > 3. for each namespace in ctx->root_nsproxy, check whether there are
> > any leaks, and, if so, mark it in the checkpoing image header so that
> > we can give restart a hint that it might not want to unshare those.
> 
> If we leave some work to userspace anyway, an alternative to doing
> the accounting in the kernel (remember: this scenario only makes
> sense for non-container checkpoint), is to simply also save the ref
> count of the nsproxy with the nsproxy data (not even each namespace).
> Then user space can figure out if there is a "leak".
> 
> Finally, if we do want to allow such a leak (e.g. only the uts-ns
> of the root) in a full container checkpoint, then we will need some
> way (flag ?) to request that when doing the checkpoint.
> 
> So for now, I simply revert the patch (unless you object).

Nope.  I think it's best.

thanks,
-serge

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-03-02 21:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-25 22:56 [PATCH linux-cr] nsproxy: record ambient namespaces Serge E. Hallyn
     [not found] ` <20100225225641.GA9386-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-03-01 19:20   ` Oren Laadan
     [not found]     ` <4B8C136E.5060704-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-02 18:42       ` Serge E. Hallyn
     [not found]         ` <20100302184253.GA18840-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-03-02 21:20           ` Oren Laadan
     [not found]             ` <4B8D8102.9020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-02 21:25               ` Serge E. Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.