public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@cs.columbia.edu>
To: Tejun Heo <tj@kernel.org>
Cc: Kapil Arya <kapil@ccs.neu.edu>,
	ksummit-2010-discuss@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, Gene Cooperman <gene@ccs.neu.edu>,
	hch@lst.de
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
Date: Fri, 05 Nov 2010 18:24:42 -0400	[thread overview]
Message-ID: <4CD4842A.5050009@cs.columbia.edu> (raw)
In-Reply-To: <4CD26948.7050009@kernel.org>



On 11/04/2010 04:05 AM, Tejun Heo wrote:
> Hello,
>
> On 11/04/2010 04:40 AM, Kapil Arya wrote:
>> (Sorry for resending the message; the last message contained some html
>> tags and was rejected by server)
>
> And please also don't top-post.  Being the antisocial egomaniacs we
> are, people on lkml prefer to dissect the messages we're replying to,
> insert insulting comments right where they would be most effective and
> remove the passages which can't yield effective insults.  :-)
>
>> In our personal view, a key difference between in-kernel and userland
>> approaches is the issue of security.  The Linux C/R developers state
>> the issue very well in their FAQ (question number 7):
>>> https://ckpt.wiki.kernel.org/index.php/Faq :
>>> 7. Can non-root users checkpoint/restart an application ?
>>>
>>> For now, only users with CAP_SYSADMIN privileges can C/R an
>>> application. This is to ensure that the checkpoint image has not been
>>> tampered with and will be treated like a loadable kernel-module.
>
> That's an interesting point but I don't think it's a dealbreaker.
> Kernel CR is gonna require userland agent anyway and access control
> can be done there.

Indeed, this is a restriction on the new eclone() syscall, and can
be addressed with proper userspace tools (including crypo-sign the
checkpoint image). There core of the c/r code allows a user to
restore anything within the user's privilege level.

> Being able to snapshot w/o root privieldge
> definitely is a plust but it's not like CR is gonna be deployed on
> majority of desktops and servers (if so, let's talk about it then).

Why not ?  it has zero overhead when not in use, and a reasonable
code footprint (which can be reduced by modularizing some of it,
but that's outside the point).

>> Strategies like these are easily handled in userspace.  We suspect
>> that while one may begin with a pure kernel approach, eventually,
>> one will still want to add a userland component to achieve this kind
>> of flexibility, just as BLCR has already done.
>
> Yeap, agreed.  There gotta be user agents which can monitor and
> manipulate userland states.  It's a fundamentally nasty job, that of

Are we talking about distributed checkpoint or "standalone" ?

DMTCP relies on user agents to allow distributed/remote execution
in a manner mostly transparent to the application. Many distributed
systems don't require (and do not use) user agents. Consider a
multi-tier system with web server, sql server and some applications
server. These are not suitable to DMTCP's mode or work.

(This is not to say DMTCP isn't useful - it's a clever piece of
software with specific goals and more geared towards HPC needs).

Now regarding "standalone" c/r, if you want to save/restore single
or a subset of processes of a system without the rest of it, then
you will always need user agents, regardless of userspace/kernel
method. Likewise, their work on those tools will be as useful
independently of which c/r 'engine' it uses.

When you include all the relevant processes (e.g. an entire VNC
session, a web server, HPC and batch jobs), you generally don't
need the user agents. The checkpoint is self-contained, and linux-cr
can provide you that guarantee at checkpoint time.

> collecting and applying application-specific workarounds.  I've only
> glanced the dmtcp paper so my understanding is pretty superficial.
> With that in mind, can you please answer some of my curiosities?
>
> * As Oren pointed out in another message, there are somethings which
>   could seem a bit too visible to the target application.  Like the
>   manager thread (is it visible to the application or is it hidden by
>   the libc wrapper?) and reserved signal.  Also, while it's true that
>   all programs should be ready to handle -EINTR failure from system
>   calls, it's something which is very difficult to verify and test and
>   could lead to once-in-a-blue-moon head scratchy kind of failures.

If there is a will, there is (almost always) a way ;)

What MTCP does, IIUC, is wrap around the applications with a complete
pid-namespace (and more) in userspace. There are/were also commercial
products that do that. It's a tremendous effort and I'm impressed by
their (MTCP) work so far.

It is important to understand that it has a price tag: performance
and complexity. It's usually useful for HPC needs, but unsuitable
for the generic server/VPS space.

>
>   I think most of those issues can be tackled with minor narrow-scoped
>   changes to the kernel.  Do you guys have things on mind which the
>   kernel can do to make these things more transparent or safer?

Hmmm... the kernel already does much of it - for instance, we have
neat pid-namespace infrastructure; does it make sense to go into
the trouble of adding interfaces to provide for pid-virtalization
in userspace ?  we should be past that ...

Moreover, your objection was based on the apparent complexity of
a badly presented aggregate diff (and I disagree: most of that
are simple refactoring and cleanups). However, that very set of
"narrow-scoped changes" to the kernel that you suggest, will take
life in the form of kernel patches that will do more than these
and will achieve less.

> * The feats dmtcp achieves with its set of workarounds are impressive
>   but at the same time look quite hairy.  Christoph said that having a
>   standard userland C-R implementation would be quite useful and IMHO
>   it would be helpful in that direction if the implementation is
>   modularized enough so that the core functionality and the set of
>   workarounds can be easily separated.  Is it already so?

 From what I understand, the 'wrapper' functionality to support
distributed operation is said to be well modularized from the
actual c/r engine - which will allow it to use better c/r engines;
and coincidentally, I have one in mind... ;)

Oren.

  parent reply	other threads:[~2010-11-05 22:22 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.64.1011021530470.12128@takamine.ncl.cs.columbia.edu>
2010-11-02 21:35 ` [Ksummit-2010-discuss] checkpoint-restart: naked patch Tejun Heo
2010-11-02 21:47   ` Christoph Hellwig
2010-11-04  1:47     ` Nathan Lynch
2010-11-04  7:36       ` Tejun Heo
2010-11-04 16:04         ` Gene Cooperman
2010-11-04 20:45         ` Nathan Lynch
2010-11-06  6:48           ` Matt Helsley
2010-11-04  4:34     ` Oren Laadan
2010-11-04 14:25       ` Christoph Hellwig
2010-11-04  3:40   ` Kapil Arya
2010-11-04  8:05     ` Tejun Heo
2010-11-04 16:44       ` Gene Cooperman
2010-11-05  9:28         ` Tejun Heo
2010-11-05 23:18           ` Oren Laadan
2010-11-06 10:13             ` Tejun Heo
2010-11-06  0:36           ` Kapil Arya
2010-11-06 22:55             ` Oren Laadan
2010-11-07 19:42               ` Gene Cooperman
2010-11-07 21:30                 ` Oren Laadan
2010-11-07 23:05                   ` Gene Cooperman
2010-11-08  3:55                     ` Oren Laadan
2010-11-08 16:26                       ` Gene Cooperman
2010-11-08 18:14                         ` Oren Laadan
2010-11-08 18:37                           ` Gene Cooperman
2010-11-08 19:34                             ` Oren Laadan
2010-11-08 19:05                         ` Dan Smith
2010-11-17 11:14                           ` Tejun Heo
2010-11-17 15:33                             ` Dan Smith
2010-11-17 15:40                               ` Tejun Heo
2010-11-17 17:04                                 ` Alexey Dobriyan
2010-11-17 10:45             ` Tejun Heo
2010-11-17 12:12               ` Tejun Heo
2010-11-06  5:32           ` Matt Helsley
2010-11-06 15:01             ` Oren Laadan
2010-11-06 20:40             ` Gene Cooperman
2010-11-06 22:41               ` Oren Laadan
2010-11-07 18:49                 ` Gene Cooperman
2010-11-07 21:59                   ` Oren Laadan
2010-11-17 11:57                     ` Tejun Heo
2010-11-17 15:39                       ` Serge E. Hallyn
2010-11-17 15:46                         ` Tejun Heo
2010-11-18  9:13                           ` Pavel Emelyanov
2010-11-18  9:48                             ` Tejun Heo
2010-11-18 20:13                               ` Jose R. Santos
2010-11-19  3:54                               ` Serge Hallyn
2010-11-18 19:53                           ` Oren Laadan
2010-11-19  4:10                           ` Serge Hallyn
2010-11-19 14:04                             ` Tejun Heo
2010-11-19 14:36                               ` Kirill Korotaev
2010-11-19 15:33                                 ` Tejun Heo
2010-11-19 16:00                                   ` Alexey Dobriyan
2010-11-19 16:01                                     ` Alexey Dobriyan
2010-11-19 16:10                                       ` Tejun Heo
2010-11-19 16:25                                         ` Alexey Dobriyan
2010-11-19 16:06                                     ` Tejun Heo
2010-11-19 16:16                                       ` Alexey Dobriyan
2010-11-19 16:19                                         ` Tejun Heo
2010-11-19 16:27                                           ` Alexey Dobriyan
2010-11-19 16:32                                             ` Tejun Heo
2010-11-19 16:38                                               ` Alexey Dobriyan
2010-11-19 16:50                                                 ` Tejun Heo
2010-11-19 16:55                                                   ` Alexey Dobriyan
2010-11-20 17:58                                   ` Oren Laadan
2010-11-20 18:05                               ` Oren Laadan
2010-11-20 18:08                               ` Oren Laadan
2010-11-20 18:11                               ` Oren Laadan
2010-11-20 18:15                                 ` Oren Laadan
2010-11-20 19:33                                   ` Tejun Heo
2010-11-21  8:18                                     ` Gene Cooperman
2010-11-21  8:21                                       ` Gene Cooperman
2010-11-22 18:02                                         ` Sukadev Bhattiprolu
2010-11-23 17:53                                         ` Oren Laadan
2010-11-24  3:50                                           ` Kapil Arya
2010-11-25 16:04                                             ` Oren Laadan
2010-11-29  4:09                                               ` Gene Cooperman
2010-11-21 22:41                                       ` Grant Likely
2010-11-22 17:34                                       ` Oren Laadan
2010-11-22 17:18                                     ` Oren Laadan
2010-11-17 22:17                       ` Matt Helsley
2010-11-18 10:06                         ` Tejun Heo
2010-11-18 20:25                         ` Oren Laadan
2010-11-07 21:44               ` Oren Laadan
2010-11-07 23:31                 ` Gene Cooperman
2010-11-05 22:24       ` Oren Laadan [this message]
2010-11-04  4:03   ` Oren Laadan
2010-11-04  9:43     ` Tejun Heo
2010-11-04 12:48       ` Luck, Tony
2010-11-04 13:06         ` Tejun Heo
2010-11-06 10:12       ` Matt Helsley
2010-11-06 11:03         ` Tejun Heo
2010-11-07 22:59         ` Davide Libenzi
2010-11-08  2:32           ` david
2010-11-18 20:41             ` Oren Laadan
2010-11-05  3:55     ` Kapil Arya
2010-11-05 11:57       ` Luck, Tony
2010-11-05 17:17         ` Gene Cooperman
2010-11-06  1:16           ` Matt Helsley
2010-11-06  4:06             ` Oren Laadan
2010-11-06  5:18               ` Matt Helsley
2010-11-06 21:00           ` Oren Laadan
2010-11-05 17:31       ` Sukadev Bhattiprolu
2010-11-06 21:05       ` Oren Laadan
2010-11-08 16:55 ` Grant Likely
2010-11-08 21:01   ` Nathan Lynch
2010-11-11  6:27   ` Nathan Lynch
2010-11-17  5:29   ` Anton Blanchard
2010-11-17 11:08     ` Tejun Heo
2010-11-18  9:53     ` Alan Cox
2010-11-18 12:27       ` Alexey Dobriyan
2010-11-19  6:33     ` Gene Cooperman
2010-11-21 23:20     ` Grant Likely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CD4842A.5050009@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=gene@ccs.neu.edu \
    --cc=hch@lst.de \
    --cc=kapil@ccs.neu.edu \
    --cc=ksummit-2010-discuss@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox