public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@cs.columbia.edu>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: containers@lists.osdl.org, Dave Hansen <dave@linux.vnet.ibm.com>,
	"Serge E. Hallyn" <serue@us.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: Creating tasks on restart: userspace vs kernel
Date: Tue, 14 Apr 2009 16:10:53 -0400	[thread overview]
Message-ID: <49E4EDCD.5010406@cs.columbia.edu> (raw)
In-Reply-To: <20090414195909.GA28353@x200.localdomain>



Alexey Dobriyan wrote:
>>> In the end correctness of chopping will be equal to how good user
>>> understands that two task_struct's are independent of each other.
>>>
>>>> But it will still be a useful tool for many use cases, like batch cpu jobs,
>>>> some servers, vnc sessions (if you want graphics) etc. Imagine you run
>>>> 'octave' for a week and must reboot now - 'octave' wouldn't care if
>>>> you checkpointed it and then restart with a different pid !
>>>>
>>>> <3> Clone with pid:
>>>>
>>>> To restart processes from userspace, there needs to be a way to
>>>> request a specific pid--in the current pid_ns--for the child process
>>>> (clearly, if it isn't in use).
>>>>
>>>> Why is it a disadvantage ?  to Linus, a syscall clone_with_pid()
>>>> "sounds like a _wonderful_ attack vector against badly written
>>>> user-land software...".  Actually, getting a specific pid is possible
>>>> without this syscall.  But the point is that it's undesirable to have
>>>> this functionality unrestricted.
>>>>
>>>> So one option is to require root privileges. Another option is to
>>>> restrict such action in pid_ns created by the same user. Even more so,
>>>> restrict to only containers that are being restarted.
>>> You want to do small part in userspace and consequently end up with hacks
>>> both userspace-visible and in-kernel.
>> I want to extend existing kernel interface to leverage fork/clone
>> from user space, AND to allow the flexibility mentioned above (which
>> you conveniently ignored).
>>
>> All hacks are in-kernel, aren't they ?
> 
> mktree.c can be vieved as hack, why not?

Lol .. I meant "all kernel hacks are in-kernel" :)

> 
> The whole existence of these requirements. You want new syscall or SET_NEX_PID
> or /proc file or something.

Or embed it into a restart(2) call with special argument.

> 
>> As for asking for a specific pid from user space, it can be done by:
>> * a new syscall (restricted to user-owned-namespace or CAP_SYS_ADMIN)
>> * a sys_restart(... SET_NEXT_PID) interface specific for restart (ugh)
>> * setting a special /proc/PID/next_id  file which is consulted by fork
> 
> /proc/*/next_id was disscussed and hopefully died, but no.
> 
>> and in all cases, limit this so it can only allowed in a restarting
>> container, under the proper security model (again, e.g., Serge's
>> suggestion).
>>
>>> Pids aren't special, they are struct pid, dynamically allocated and
>>> refcounted just like any other structtures.
>>>
>>> They _become_ special for you intended method of restart.
>> They are special. And I allow them not to be restored, as well, if
>> the use case so wishes.
> 
> The use case is to restore as much as possible to the same state as
> equal as possible. Not going with fork_with_pid() in any form helps
> kernel to ensure correctness of restore and helps to avoid surprise
> failure modes from user POV.
> 
>>> You also have flags in nsproxy image (or where?) like "do clone with
>>> CLONE_NEWUTS".
>> Nope. Read the code.
> 
> Which code?
> 
> 	static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
> 	{
> 		...
> 
> 		new_uts = cr_obj_add_ptr(ctx, nsproxy->uts_ns,
> 					&hh->uts_ref, CR_OBJ_UTSNS, 0);
> 		if (new_uts < 0) {
> 			ret = new_uts;
> 			goto out;
> 		}
> 
> 		hh->flags = 0;
> 		if (new_uts)
> 	===>		hh->flags |= CLONE_NEWUTS;
> 
> 		ret = cr_write_obj(ctx, &h, hh);
> 			...
> 
>>> This is unneeded!
>>>
>>> nsproxy (or task_struct) image have reference (objref/position) to uts_ns image.
>>>
>>> On restart, one lookups object by reference or restore it if needed,
>>> takes refcount and glue. Just like with every other two structures.
>> That's exactly how it's done.
> 
> Not for uts_ns and future namespaces.
> 
> 	ret = cr_restore_utsns(ctx, hh->uts_ref, hh->flags);
> 						 ^^^^^^^^^
> 						 comes from disk

Where else would it come from ?  that's part of the state saved during
checkpoint.

That's for nested UTS namespaces, where a task in container called
unshare().

Oren.


  reply	other threads:[~2009-04-14 20:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-14  3:43 Creating tasks on restart: userspace vs kernel Oren Laadan
2009-04-14  9:59 ` Ingo Molnar
2009-04-14 14:53   ` Oren Laadan
2009-04-14 16:16     ` Serge E. Hallyn
2009-04-14 16:36 ` Alexey Dobriyan
2009-04-14 16:46   ` Alexey Dobriyan
2009-04-14 18:40   ` Oren Laadan
2009-04-14 19:59     ` Alexey Dobriyan
2009-04-14 20:10       ` Oren Laadan [this message]
2009-04-14 21:01         ` Alexey Dobriyan
2009-04-15 19:56     ` C/R without "leaks" (was: Re: Creating tasks on restart: userspace vs kernel) Alexey Dobriyan
2009-04-15 21:38       ` C/R without "leaks" Oren Laadan
2009-04-22  0:16         ` Nathan Lynch
2009-04-15 22:42       ` C/R without "leaks" (was: Re: Creating tasks on restart: userspace vs kernel) Greg Kurz
2009-04-16 16:12         ` Alexey Dobriyan
2009-04-16 18:10           ` C/R without "leaks" Chris Friesen
2009-04-16 18:39             ` Oren Laadan
2009-04-17  9:15               ` Greg Kurz
2009-04-17  9:48                 ` Oren Laadan
2009-04-17 12:25                   ` Greg Kurz
2009-04-17  8:46           ` C/R without "leaks" (was: Re: Creating tasks on restart: userspace vs kernel) Greg Kurz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49E4EDCD.5010406@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.osdl.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=serue@us.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox