From: Mike Waychison <mikew@google.com>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-api@vger.kernel.org, containers@lists.linux-foundation.org,
hpa@zytor.com, linux-kernel@vger.kernel.org,
Dave Hansen <dave@linux.vnet.ibm.com>,
linux-mm@kvack.org, viro@zeniv.linux.org.uk, mingo@elte.hu,
mpm@selenic.com, tglx@linutronix.de,
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>,
Alexey Dobriyan <adobriyan@gmail.com>,
xemul@openvz.org
Subject: Re: How much of a mess does OpenVZ make? ;) Was: What can OpenVZ do?
Date: Wed, 18 Mar 2009 11:54:19 -0700 [thread overview]
Message-ID: <49C1435B.1090809@google.com> (raw)
In-Reply-To: <49BADFCE.8020207@cs.columbia.edu>
Oren Laadan wrote:
>
> Mike Waychison wrote:
>> Linus Torvalds wrote:
>>> On Thu, 12 Mar 2009, Sukadev Bhattiprolu wrote:
>>>
>>>> Ying Han [yinghan@google.com] wrote:
>>>> | Hi Serge:
>>>> | I made a patch based on Oren's tree recently which implement a new
>>>> | syscall clone_with_pid. I tested with checkpoint/restart process tree
>>>> | and it works as expected.
>>>>
>>>> Yes, I think we had a version of clone() with pid a while ago.
>>> Are people _at_all_ thinking about security?
>>>
>>> Obviously not.
>>>
>>> There's no way we can do anything like this. Sure, it's trivial to do
>>> inside the kernel. But it also sounds like a _wonderful_ attack vector
>>> against badly written user-land software that sends signals and has small
>>> races.
>> I'm not really sure how this is different than a malicious app going off
>> and spawning thousands of threads in an attempt to hit a target pid from
>> a security pov. Sure, it makes it easier, but it's not like there is
>> anything in place to close the attack vector.
>>
>>> Quite frankly, from having followed the discussion(s) over the last few
>>> weeks about checkpoint/restart in various forms, my reaction to just about
>>> _all_ of this is that people pushing this are pretty damn borderline.
>>>
>>> I think you guys are working on all the wrong problems.
>>>
>>> Let's face it, we're not going to _ever_ checkpoint any kind of general
>>> case process. Just TCP makes that fundamentally impossible in the general
>>> case, and there are lots and lots of other cases too (just something as
>>> totally _trivial_ as all the files in the filesystem that don't get rolled
>>> back).
>> In some instances such as ours, TCP is probably the easiest thing to
>> migrate. In an rpc-based cluster application, TCP is nothing more than
>> an RPC channel and applications already have to handle RPC channel
>> failure and re-establishment.
>>
>> I agree that this is not the 'general case' as you mention above
>> however. This is the bit that sorta bothers me with the way the
>> implementation has been going so far on this list. The implementation
>> that folks are building on top of Oren's patchset tries to be everything
>> to everybody. For our purposes, we need to have the flexibility of
>> choosing *how* we checkpoint. The line seems to be arbitrarily drawn at
>> the kernel being responsible for checkpointing and restoring all
>> resources associated with a task, and leaving userland with nothing more
>> than transporting filesystem bits. This approach isn't flexible enough:
>> Consider the case where we want to stub out most of the TCP file
>> descriptors with ECONNRESETed sockets because we know that they are RPC
>> sockets and can re-establish themselves, but we want to use some other
>> mechanism for TCP sockets we don't know much about. The current
>> monolithic approach has zero flexibility for doing anything like this,
>> and I figure out how we could even fit anything like this in.
>
> The flexibility exists, but wasn't spelled out, so here it is:
>
> 1) Similar to madvice(), I envision a cradvice() that could tell the c/r
> something about specific resources, e.g.:
> * cradvice(CR_ADV_MEM, ptr, len) -> don't save that memory, it's scratch
> * cradvice(CR_ADV_SOCK, fd, CR_ADV_SOCK_RESET) -> reset connection on restart
> etc .. (nevermind the exact interface right now)
>
> 2) Tasks can ask to be notified (e.g. register a signal) when a checkpoint
> or a restart complete successfully. At that time they can do their private
> house-keeping if they know better.
>
> 3) If restoring some resource is significantly easier in user space (e.g. a
> file-descriptor of some special device which user space knows how to
> re-initialize), then the restarting task can prepare it ahead of time,
> and, call:
> * cradvice(CR_ADV_USERFD, fd, 0) -> use the fd in place instead of trying
> to restore it yourself.
This would be called by the embryo process (mktree.c?) before calling
sys_restart?
>
> Method #3 is what I used in Zap to implement distributed checkpoints, where
> it is so much easier to recreate all network connections in user space then
> putting that logic into the kernel.
>
> Now, on the other hand, doing the c/r from userland is much less flexible
> than in the kernel (e.g. epollfd, futex state and much more) and requires
> exposing tremendous amount of in-kernel data to user space. And we all know
> than exposing internals is always a one-way ticket :(
>
> [...]
>
> Oren.
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-03-18 18:54 UTC|newest]
Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-27 17:07 [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Oren Laadan
2009-01-27 17:07 ` [RFC v13][PATCH 01/14] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-01-27 17:20 ` Randy Dunlap
2009-01-27 17:08 ` [RFC v13][PATCH 02/14] Checkpoint/restart: initial documentation Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 03/14] Make file_pos_read/write() public Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 04/14] General infrastructure for checkpoint restart Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 05/14] x86 support for checkpoint/restart Oren Laadan
2009-02-24 7:47 ` Nathan Lynch
[not found] ` <20090224014739.1b82fc35-4v5LP+xe+1byhTdZtsIeww@public.gmane.org>
2009-02-24 16:06 ` Dave Hansen
2009-03-18 7:21 ` Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 06/14] Dump memory address space Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 07/14] Restore " Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 08/14] Infrastructure for shared objects Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 09/14] Dump open file descriptors Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 11/14] External checkpoint of a task other than ourself Oren Laadan
[not found] ` <1233076092-8660-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-01-27 17:08 ` [RFC v13][PATCH 10/14] Restore open file descriprtors Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 12/14] Track in-kernel when we expect checkpoint/restart to work Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 14/14] Restart multiple processes Oren Laadan
2009-01-27 17:08 ` [RFC v13][PATCH 13/14] Checkpoint " Oren Laadan
2009-02-10 17:05 ` [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Dave Hansen
2009-02-11 22:14 ` Andrew Morton
2009-02-12 9:17 ` Ingo Molnar
[not found] ` <20090212091721.GB1888-X9Un+BFzKDI@public.gmane.org>
2009-02-12 18:11 ` Dave Hansen
2009-02-12 20:48 ` Serge E. Hallyn
2009-02-13 10:20 ` Ingo Molnar
2009-02-12 18:11 ` Dave Hansen
2009-02-12 19:30 ` Matt Mackall
2009-02-12 19:42 ` Andrew Morton
2009-02-12 21:51 ` What can OpenVZ do? Dave Hansen
2009-02-12 22:10 ` Andrew Morton
2009-02-12 23:04 ` How much of a mess does OpenVZ make? ;) Was: " Dave Hansen
2009-02-26 15:57 ` Alexey Dobriyan
2009-03-10 21:53 ` Alexey Dobriyan
2009-03-10 23:28 ` Serge E. Hallyn
2009-03-11 8:26 ` Cedric Le Goater
2009-03-12 14:53 ` Serge E. Hallyn
2009-03-12 21:01 ` Greg Kurz
2009-03-12 21:21 ` Serge E. Hallyn
2009-03-13 4:29 ` Ying Han
2009-03-13 5:34 ` Sukadev Bhattiprolu
[not found] ` <20090313053458.GA28833-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-03-13 6:19 ` Ying Han
2009-03-13 17:27 ` Linus Torvalds
2009-03-13 19:02 ` Serge E. Hallyn
[not found] ` <alpine.LFD.2.00.0903131018390.3940-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-03-13 19:35 ` Alexey Dobriyan
2009-03-13 21:01 ` Linus Torvalds
2009-03-13 21:51 ` Dave Hansen
2009-03-13 22:15 ` Oren Laadan
2009-03-14 0:27 ` Eric W. Biederman
2009-03-14 8:12 ` Ingo Molnar
2009-03-16 22:33 ` Kevin Fox
2009-03-19 21:19 ` Eric W. Biederman
[not found] ` <alpine.LFD.2.00.0903131401070.3940-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-03-14 0:20 ` Alexey Dobriyan
2009-03-14 8:25 ` Ingo Molnar
[not found] ` <20090314082532.GB16436-X9Un+BFzKDI@public.gmane.org>
2009-03-14 17:11 ` Joseph Ruscio
2009-03-16 6:01 ` Oren Laadan
2009-03-13 20:48 ` Mike Waychison
2009-03-13 22:35 ` Oren Laadan
2009-03-18 18:54 ` Mike Waychison [this message]
2009-03-18 19:04 ` Oren Laadan
[not found] ` <604427e00903122129y37ad791aq5fe7ef2552415da9-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-03-13 15:27 ` Cedric Le Goater
[not found] ` <49BA7B60.60607-GANU6spQydw@public.gmane.org>
2009-03-13 17:11 ` Greg Kurz
2009-03-13 17:37 ` Serge E. Hallyn
2009-03-13 15:47 ` Cedric Le Goater
2009-03-13 16:35 ` Serge E. Hallyn
2009-03-13 16:53 ` Cedric Le Goater
2009-02-26 16:27 ` Alexey Dobriyan
2009-02-26 17:33 ` Ingo Molnar
[not found] ` <20090226173302.GB29439-X9Un+BFzKDI@public.gmane.org>
2009-02-26 18:30 ` Greg Kurz
2009-02-26 22:17 ` Alexey Dobriyan
[not found] ` <20090226221709.GA2924-2ev+ksY9ol182hYKe6nXyg@public.gmane.org>
2009-02-27 9:19 ` Greg Kurz
2009-02-27 10:53 ` Alexey Dobriyan
2009-02-27 14:33 ` Cedric Le Goater
2009-02-27 9:36 ` Cedric Le Goater
2009-02-26 22:31 ` Alexey Dobriyan
2009-02-27 9:03 ` Ingo Molnar
2009-02-27 9:19 ` Andrew Morton
2009-02-27 10:57 ` Alexey Dobriyan
[not found] ` <20090227090323.GC16211-X9Un+BFzKDI@public.gmane.org>
2009-02-27 9:22 ` Andrew Morton
2009-02-27 10:59 ` Alexey Dobriyan
2009-02-27 16:14 ` Dave Hansen
2009-02-27 21:57 ` Alexey Dobriyan
[not found] ` <20090227215749.GA3453-2ev+ksY9ol182hYKe6nXyg@public.gmane.org>
2009-02-27 21:54 ` Dave Hansen
[not found] ` <20090226223112.GA2939-2ev+ksY9ol182hYKe6nXyg@public.gmane.org>
2009-03-01 1:33 ` Alexey Dobriyan
[not found] ` <20090301013304.GA2428-2ev+ksY9ol182hYKe6nXyg@public.gmane.org>
2009-03-01 20:02 ` Serge E. Hallyn
[not found] ` <20090301200231.GA25276-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-03-01 20:56 ` Alexey Dobriyan
2009-03-01 22:21 ` Serge E. Hallyn
2009-03-03 16:17 ` Cedric Le Goater
2009-03-03 18:28 ` Serge E. Hallyn
2009-02-13 10:53 ` Ingo Molnar
[not found] ` <20090213105302.GC4608-X9Un+BFzKDI@public.gmane.org>
2009-02-16 20:51 ` Dave Hansen
2009-02-17 22:23 ` Ingo Molnar
[not found] ` <20090217222319.GA10546-X9Un+BFzKDI@public.gmane.org>
2009-02-17 22:30 ` Dave Hansen
2009-02-18 0:32 ` Ingo Molnar
2009-02-18 0:40 ` Dave Hansen
2009-02-18 5:11 ` Alexey Dobriyan
2009-02-18 18:16 ` Ingo Molnar
[not found] ` <20090218181644.GD19995-X9Un+BFzKDI@public.gmane.org>
2009-02-18 21:27 ` Dave Hansen
2009-02-18 23:15 ` Ingo Molnar
2009-02-19 19:06 ` Banning checkpoint (was: Re: What can OpenVZ do?) Alexey Dobriyan
2009-02-19 19:11 ` Dave Hansen
2009-02-24 4:47 ` Alexey Dobriyan
[not found] ` <20090224044752.GB3202-2ev+ksY9ol182hYKe6nXyg@public.gmane.org>
2009-02-24 5:11 ` Dave Hansen
2009-02-24 15:43 ` Serge E. Hallyn
2009-02-24 20:09 ` Alexey Dobriyan
2009-02-12 22:17 ` What can OpenVZ do? Alexey Dobriyan
2009-02-13 10:27 ` Ingo Molnar
2009-02-13 11:32 ` Alexey Dobriyan
2009-02-13 11:45 ` Ingo Molnar
2009-02-13 22:28 ` Alexey Dobriyan
2009-03-14 0:04 ` Eric W. Biederman
2009-03-14 0:26 ` Serge E. Hallyn
2009-02-12 22:57 ` [RFC v13][PATCH 00/14] Kernel based checkpoint/restart Dave Hansen
2009-02-12 23:05 ` Matt Mackall
2009-02-12 23:13 ` Dave Hansen
2009-02-13 23:28 ` Andrew Morton
2009-02-14 23:08 ` Ingo Molnar
2009-02-14 23:31 ` Andrew Morton
2009-02-14 23:50 ` Ingo Molnar
[not found] ` <20090213152836.0fbbfa7d.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2009-02-16 17:37 ` Dave Hansen
2009-03-13 2:45 ` Oren Laadan
2009-03-13 3:57 ` Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49C1435B.1090809@google.com \
--to=mikew@google.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=mpm@selenic.com \
--cc=orenl@cs.columbia.edu \
--cc=sukadev@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).