From: Oren Laadan <orenl@cs.columbia.edu>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: containers@lists.osdl.org, Dave Hansen <dave@linux.vnet.ibm.com>,
"Serge E. Hallyn" <serue@us.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Linux-Kernel <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>
Subject: Re: Creating tasks on restart: userspace vs kernel
Date: Tue, 14 Apr 2009 16:10:53 -0400 [thread overview]
Message-ID: <49E4EDCD.5010406@cs.columbia.edu> (raw)
In-Reply-To: <20090414195909.GA28353@x200.localdomain>
Alexey Dobriyan wrote:
>>> In the end correctness of chopping will be equal to how good user
>>> understands that two task_struct's are independent of each other.
>>>
>>>> But it will still be a useful tool for many use cases, like batch cpu jobs,
>>>> some servers, vnc sessions (if you want graphics) etc. Imagine you run
>>>> 'octave' for a week and must reboot now - 'octave' wouldn't care if
>>>> you checkpointed it and then restart with a different pid !
>>>>
>>>> <3> Clone with pid:
>>>>
>>>> To restart processes from userspace, there needs to be a way to
>>>> request a specific pid--in the current pid_ns--for the child process
>>>> (clearly, if it isn't in use).
>>>>
>>>> Why is it a disadvantage ? to Linus, a syscall clone_with_pid()
>>>> "sounds like a _wonderful_ attack vector against badly written
>>>> user-land software...". Actually, getting a specific pid is possible
>>>> without this syscall. But the point is that it's undesirable to have
>>>> this functionality unrestricted.
>>>>
>>>> So one option is to require root privileges. Another option is to
>>>> restrict such action in pid_ns created by the same user. Even more so,
>>>> restrict to only containers that are being restarted.
>>> You want to do small part in userspace and consequently end up with hacks
>>> both userspace-visible and in-kernel.
>> I want to extend existing kernel interface to leverage fork/clone
>> from user space, AND to allow the flexibility mentioned above (which
>> you conveniently ignored).
>>
>> All hacks are in-kernel, aren't they ?
>
> mktree.c can be vieved as hack, why not?
Lol .. I meant "all kernel hacks are in-kernel" :)
>
> The whole existence of these requirements. You want new syscall or SET_NEX_PID
> or /proc file or something.
Or embed it into a restart(2) call with special argument.
>
>> As for asking for a specific pid from user space, it can be done by:
>> * a new syscall (restricted to user-owned-namespace or CAP_SYS_ADMIN)
>> * a sys_restart(... SET_NEXT_PID) interface specific for restart (ugh)
>> * setting a special /proc/PID/next_id file which is consulted by fork
>
> /proc/*/next_id was disscussed and hopefully died, but no.
>
>> and in all cases, limit this so it can only allowed in a restarting
>> container, under the proper security model (again, e.g., Serge's
>> suggestion).
>>
>>> Pids aren't special, they are struct pid, dynamically allocated and
>>> refcounted just like any other structtures.
>>>
>>> They _become_ special for you intended method of restart.
>> They are special. And I allow them not to be restored, as well, if
>> the use case so wishes.
>
> The use case is to restore as much as possible to the same state as
> equal as possible. Not going with fork_with_pid() in any form helps
> kernel to ensure correctness of restore and helps to avoid surprise
> failure modes from user POV.
>
>>> You also have flags in nsproxy image (or where?) like "do clone with
>>> CLONE_NEWUTS".
>> Nope. Read the code.
>
> Which code?
>
> static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
> {
> ...
>
> new_uts = cr_obj_add_ptr(ctx, nsproxy->uts_ns,
> &hh->uts_ref, CR_OBJ_UTSNS, 0);
> if (new_uts < 0) {
> ret = new_uts;
> goto out;
> }
>
> hh->flags = 0;
> if (new_uts)
> ===> hh->flags |= CLONE_NEWUTS;
>
> ret = cr_write_obj(ctx, &h, hh);
> ...
>
>>> This is unneeded!
>>>
>>> nsproxy (or task_struct) image have reference (objref/position) to uts_ns image.
>>>
>>> On restart, one lookups object by reference or restore it if needed,
>>> takes refcount and glue. Just like with every other two structures.
>> That's exactly how it's done.
>
> Not for uts_ns and future namespaces.
>
> ret = cr_restore_utsns(ctx, hh->uts_ref, hh->flags);
> ^^^^^^^^^
> comes from disk
Where else would it come from ? that's part of the state saved during
checkpoint.
That's for nested UTS namespaces, where a task in container called
unshare().
Oren.
next prev parent reply other threads:[~2009-04-14 20:10 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-14 3:43 Creating tasks on restart: userspace vs kernel Oren Laadan
2009-04-14 9:59 ` Ingo Molnar
2009-04-14 14:53 ` Oren Laadan
2009-04-14 16:16 ` Serge E. Hallyn
2009-04-14 16:36 ` Alexey Dobriyan
2009-04-14 16:46 ` Alexey Dobriyan
[not found] ` <20090414163633.GE27461-2ev+ksY9ol182hYKe6nXyg@public.gmane.org>
2009-04-14 18:40 ` Oren Laadan
2009-04-14 18:40 ` Oren Laadan
2009-04-14 19:59 ` Alexey Dobriyan
2009-04-14 20:10 ` Oren Laadan [this message]
2009-04-14 21:01 ` Alexey Dobriyan
2009-04-15 19:56 ` C/R without "leaks" (was: Re: Creating tasks on restart: userspace vs kernel) Alexey Dobriyan
2009-04-15 21:38 ` C/R without "leaks" Oren Laadan
2009-04-22 0:16 ` Nathan Lynch
2009-04-15 22:42 ` C/R without "leaks" (was: Re: Creating tasks on restart: userspace vs kernel) Greg Kurz
2009-04-16 16:12 ` Alexey Dobriyan
2009-04-16 16:12 ` Alexey Dobriyan
2009-04-16 18:10 ` C/R without "leaks" Chris Friesen
[not found] ` <49E774B1.5060505-ZIRUuHA3oDzQT0dZR+AlfA@public.gmane.org>
2009-04-16 18:39 ` Oren Laadan
2009-04-16 18:39 ` Oren Laadan
2009-04-17 9:15 ` Greg Kurz
2009-04-17 9:48 ` Oren Laadan
2009-04-17 12:25 ` Greg Kurz
2009-04-17 8:46 ` C/R without "leaks" (was: Re: Creating tasks on restart: userspace vs kernel) Greg Kurz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49E4EDCD.5010406@cs.columbia.edu \
--to=orenl@cs.columbia.edu \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.osdl.org \
--cc=dave@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=serue@us.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.