From: Oren Laadan <orenl@cs.columbia.edu>
To: Tejun Heo <tj@kernel.org>
Cc: Kapil Arya <kapil@ccs.neu.edu>,
ksummit-2010-discuss@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, Gene Cooperman <gene@ccs.neu.edu>,
hch@lst.de
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
Date: Fri, 05 Nov 2010 18:24:42 -0400 [thread overview]
Message-ID: <4CD4842A.5050009@cs.columbia.edu> (raw)
In-Reply-To: <4CD26948.7050009@kernel.org>
On 11/04/2010 04:05 AM, Tejun Heo wrote:
> Hello,
>
> On 11/04/2010 04:40 AM, Kapil Arya wrote:
>> (Sorry for resending the message; the last message contained some html
>> tags and was rejected by server)
>
> And please also don't top-post. Being the antisocial egomaniacs we
> are, people on lkml prefer to dissect the messages we're replying to,
> insert insulting comments right where they would be most effective and
> remove the passages which can't yield effective insults. :-)
>
>> In our personal view, a key difference between in-kernel and userland
>> approaches is the issue of security. The Linux C/R developers state
>> the issue very well in their FAQ (question number 7):
>>> https://ckpt.wiki.kernel.org/index.php/Faq :
>>> 7. Can non-root users checkpoint/restart an application ?
>>>
>>> For now, only users with CAP_SYSADMIN privileges can C/R an
>>> application. This is to ensure that the checkpoint image has not been
>>> tampered with and will be treated like a loadable kernel-module.
>
> That's an interesting point but I don't think it's a dealbreaker.
> Kernel CR is gonna require userland agent anyway and access control
> can be done there.
Indeed, this is a restriction on the new eclone() syscall, and can
be addressed with proper userspace tools (including crypo-sign the
checkpoint image). There core of the c/r code allows a user to
restore anything within the user's privilege level.
> Being able to snapshot w/o root privieldge
> definitely is a plust but it's not like CR is gonna be deployed on
> majority of desktops and servers (if so, let's talk about it then).
Why not ? it has zero overhead when not in use, and a reasonable
code footprint (which can be reduced by modularizing some of it,
but that's outside the point).
>> Strategies like these are easily handled in userspace. We suspect
>> that while one may begin with a pure kernel approach, eventually,
>> one will still want to add a userland component to achieve this kind
>> of flexibility, just as BLCR has already done.
>
> Yeap, agreed. There gotta be user agents which can monitor and
> manipulate userland states. It's a fundamentally nasty job, that of
Are we talking about distributed checkpoint or "standalone" ?
DMTCP relies on user agents to allow distributed/remote execution
in a manner mostly transparent to the application. Many distributed
systems don't require (and do not use) user agents. Consider a
multi-tier system with web server, sql server and some applications
server. These are not suitable to DMTCP's mode or work.
(This is not to say DMTCP isn't useful - it's a clever piece of
software with specific goals and more geared towards HPC needs).
Now regarding "standalone" c/r, if you want to save/restore single
or a subset of processes of a system without the rest of it, then
you will always need user agents, regardless of userspace/kernel
method. Likewise, their work on those tools will be as useful
independently of which c/r 'engine' it uses.
When you include all the relevant processes (e.g. an entire VNC
session, a web server, HPC and batch jobs), you generally don't
need the user agents. The checkpoint is self-contained, and linux-cr
can provide you that guarantee at checkpoint time.
> collecting and applying application-specific workarounds. I've only
> glanced the dmtcp paper so my understanding is pretty superficial.
> With that in mind, can you please answer some of my curiosities?
>
> * As Oren pointed out in another message, there are somethings which
> could seem a bit too visible to the target application. Like the
> manager thread (is it visible to the application or is it hidden by
> the libc wrapper?) and reserved signal. Also, while it's true that
> all programs should be ready to handle -EINTR failure from system
> calls, it's something which is very difficult to verify and test and
> could lead to once-in-a-blue-moon head scratchy kind of failures.
If there is a will, there is (almost always) a way ;)
What MTCP does, IIUC, is wrap around the applications with a complete
pid-namespace (and more) in userspace. There are/were also commercial
products that do that. It's a tremendous effort and I'm impressed by
their (MTCP) work so far.
It is important to understand that it has a price tag: performance
and complexity. It's usually useful for HPC needs, but unsuitable
for the generic server/VPS space.
>
> I think most of those issues can be tackled with minor narrow-scoped
> changes to the kernel. Do you guys have things on mind which the
> kernel can do to make these things more transparent or safer?
Hmmm... the kernel already does much of it - for instance, we have
neat pid-namespace infrastructure; does it make sense to go into
the trouble of adding interfaces to provide for pid-virtalization
in userspace ? we should be past that ...
Moreover, your objection was based on the apparent complexity of
a badly presented aggregate diff (and I disagree: most of that
are simple refactoring and cleanups). However, that very set of
"narrow-scoped changes" to the kernel that you suggest, will take
life in the form of kernel patches that will do more than these
and will achieve less.
> * The feats dmtcp achieves with its set of workarounds are impressive
> but at the same time look quite hairy. Christoph said that having a
> standard userland C-R implementation would be quite useful and IMHO
> it would be helpful in that direction if the implementation is
> modularized enough so that the core functionality and the set of
> workarounds can be easily separated. Is it already so?
From what I understand, the 'wrapper' functionality to support
distributed operation is said to be well modularized from the
actual c/r engine - which will allow it to use better c/r engines;
and coincidentally, I have one in mind... ;)
Oren.
next prev parent reply other threads:[~2010-11-05 22:22 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <Pine.LNX.4.64.1011021530470.12128@takamine.ncl.cs.columbia.edu>
2010-11-02 21:35 ` [Ksummit-2010-discuss] checkpoint-restart: naked patch Tejun Heo
2010-11-02 21:47 ` Christoph Hellwig
2010-11-04 1:47 ` Nathan Lynch
2010-11-04 7:36 ` Tejun Heo
2010-11-04 16:04 ` Gene Cooperman
2010-11-04 20:45 ` Nathan Lynch
2010-11-06 6:48 ` Matt Helsley
2010-11-04 4:34 ` Oren Laadan
2010-11-04 14:25 ` Christoph Hellwig
2010-11-04 3:40 ` Kapil Arya
2010-11-04 8:05 ` Tejun Heo
2010-11-04 16:44 ` Gene Cooperman
2010-11-05 9:28 ` Tejun Heo
2010-11-05 23:18 ` Oren Laadan
2010-11-06 10:13 ` Tejun Heo
2010-11-06 0:36 ` Kapil Arya
2010-11-06 22:55 ` Oren Laadan
2010-11-07 19:42 ` Gene Cooperman
2010-11-07 21:30 ` Oren Laadan
2010-11-07 23:05 ` Gene Cooperman
2010-11-08 3:55 ` Oren Laadan
2010-11-08 16:26 ` Gene Cooperman
2010-11-08 18:14 ` Oren Laadan
2010-11-08 18:37 ` Gene Cooperman
2010-11-08 19:34 ` Oren Laadan
2010-11-08 19:05 ` Dan Smith
2010-11-17 11:14 ` Tejun Heo
2010-11-17 15:33 ` Dan Smith
2010-11-17 15:40 ` Tejun Heo
2010-11-17 17:04 ` Alexey Dobriyan
2010-11-17 10:45 ` Tejun Heo
2010-11-17 12:12 ` Tejun Heo
2010-11-06 5:32 ` Matt Helsley
2010-11-06 15:01 ` Oren Laadan
2010-11-06 20:40 ` Gene Cooperman
2010-11-06 22:41 ` Oren Laadan
2010-11-07 18:49 ` Gene Cooperman
2010-11-07 21:59 ` Oren Laadan
2010-11-17 11:57 ` Tejun Heo
2010-11-17 15:39 ` Serge E. Hallyn
2010-11-17 15:46 ` Tejun Heo
2010-11-18 9:13 ` Pavel Emelyanov
2010-11-18 9:48 ` Tejun Heo
2010-11-18 20:13 ` Jose R. Santos
2010-11-19 3:54 ` Serge Hallyn
2010-11-18 19:53 ` Oren Laadan
2010-11-19 4:10 ` Serge Hallyn
2010-11-19 14:04 ` Tejun Heo
2010-11-19 14:36 ` Kirill Korotaev
2010-11-19 15:33 ` Tejun Heo
2010-11-19 16:00 ` Alexey Dobriyan
2010-11-19 16:01 ` Alexey Dobriyan
2010-11-19 16:10 ` Tejun Heo
2010-11-19 16:25 ` Alexey Dobriyan
2010-11-19 16:06 ` Tejun Heo
2010-11-19 16:16 ` Alexey Dobriyan
2010-11-19 16:19 ` Tejun Heo
2010-11-19 16:27 ` Alexey Dobriyan
2010-11-19 16:32 ` Tejun Heo
2010-11-19 16:38 ` Alexey Dobriyan
2010-11-19 16:50 ` Tejun Heo
2010-11-19 16:55 ` Alexey Dobriyan
2010-11-20 17:58 ` Oren Laadan
2010-11-20 18:05 ` Oren Laadan
2010-11-20 18:08 ` Oren Laadan
2010-11-20 18:11 ` Oren Laadan
2010-11-20 18:15 ` Oren Laadan
2010-11-20 19:33 ` Tejun Heo
2010-11-21 8:18 ` Gene Cooperman
2010-11-21 8:21 ` Gene Cooperman
2010-11-22 18:02 ` Sukadev Bhattiprolu
2010-11-23 17:53 ` Oren Laadan
2010-11-24 3:50 ` Kapil Arya
2010-11-25 16:04 ` Oren Laadan
2010-11-29 4:09 ` Gene Cooperman
2010-11-21 22:41 ` Grant Likely
2010-11-22 17:34 ` Oren Laadan
2010-11-22 17:18 ` Oren Laadan
2010-11-17 22:17 ` Matt Helsley
2010-11-18 10:06 ` Tejun Heo
2010-11-18 20:25 ` Oren Laadan
2010-11-07 21:44 ` Oren Laadan
2010-11-07 23:31 ` Gene Cooperman
2010-11-05 22:24 ` Oren Laadan [this message]
2010-11-04 4:03 ` Oren Laadan
2010-11-04 9:43 ` Tejun Heo
2010-11-04 12:48 ` Luck, Tony
2010-11-04 13:06 ` Tejun Heo
2010-11-06 10:12 ` Matt Helsley
2010-11-06 11:03 ` Tejun Heo
2010-11-07 22:59 ` Davide Libenzi
2010-11-08 2:32 ` david
2010-11-18 20:41 ` Oren Laadan
2010-11-05 3:55 ` Kapil Arya
2010-11-05 11:57 ` Luck, Tony
2010-11-05 17:17 ` Gene Cooperman
2010-11-06 1:16 ` Matt Helsley
2010-11-06 4:06 ` Oren Laadan
2010-11-06 5:18 ` Matt Helsley
2010-11-06 21:00 ` Oren Laadan
2010-11-05 17:31 ` Sukadev Bhattiprolu
2010-11-06 21:05 ` Oren Laadan
2010-11-08 16:55 ` Grant Likely
2010-11-08 21:01 ` Nathan Lynch
2010-11-11 6:27 ` Nathan Lynch
2010-11-17 5:29 ` Anton Blanchard
2010-11-17 11:08 ` Tejun Heo
2010-11-18 9:53 ` Alan Cox
2010-11-18 12:27 ` Alexey Dobriyan
2010-11-19 6:33 ` Gene Cooperman
2010-11-21 23:20 ` Grant Likely
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CD4842A.5050009@cs.columbia.edu \
--to=orenl@cs.columbia.edu \
--cc=gene@ccs.neu.edu \
--cc=hch@lst.de \
--cc=kapil@ccs.neu.edu \
--cc=ksummit-2010-discuss@lists.linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox