All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@cs.columbia.edu>
To: Tejun Heo <tj@kernel.org>
Cc: Kapil Arya <kapil@ccs.neu.edu>,
	ksummit-2010-discuss@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, Gene Cooperman <gene@ccs.neu.edu>,
	hch@lst.de
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
Date: Fri, 05 Nov 2010 18:24:42 -0400	[thread overview]
Message-ID: <4CD4842A.5050009@cs.columbia.edu> (raw)
In-Reply-To: <4CD26948.7050009@kernel.org>



On 11/04/2010 04:05 AM, Tejun Heo wrote:
> Hello,
>
> On 11/04/2010 04:40 AM, Kapil Arya wrote:
>> (Sorry for resending the message; the last message contained some html
>> tags and was rejected by server)
>
> And please also don't top-post.  Being the antisocial egomaniacs we
> are, people on lkml prefer to dissect the messages we're replying to,
> insert insulting comments right where they would be most effective and
> remove the passages which can't yield effective insults.  :-)
>
>> In our personal view, a key difference between in-kernel and userland
>> approaches is the issue of security.  The Linux C/R developers state
>> the issue very well in their FAQ (question number 7):
>>> https://ckpt.wiki.kernel.org/index.php/Faq :
>>> 7. Can non-root users checkpoint/restart an application ?
>>>
>>> For now, only users with CAP_SYSADMIN privileges can C/R an
>>> application. This is to ensure that the checkpoint image has not been
>>> tampered with and will be treated like a loadable kernel-module.
>
> That's an interesting point but I don't think it's a dealbreaker.
> Kernel CR is gonna require userland agent anyway and access control
> can be done there.

Indeed, this is a restriction on the new eclone() syscall, and can
be addressed with proper userspace tools (including crypo-sign the
checkpoint image). There core of the c/r code allows a user to
restore anything within the user's privilege level.

> Being able to snapshot w/o root privieldge
> definitely is a plust but it's not like CR is gonna be deployed on
> majority of desktops and servers (if so, let's talk about it then).

Why not ?  it has zero overhead when not in use, and a reasonable
code footprint (which can be reduced by modularizing some of it,
but that's outside the point).

>> Strategies like these are easily handled in userspace.  We suspect
>> that while one may begin with a pure kernel approach, eventually,
>> one will still want to add a userland component to achieve this kind
>> of flexibility, just as BLCR has already done.
>
> Yeap, agreed.  There gotta be user agents which can monitor and
> manipulate userland states.  It's a fundamentally nasty job, that of

Are we talking about distributed checkpoint or "standalone" ?

DMTCP relies on user agents to allow distributed/remote execution
in a manner mostly transparent to the application. Many distributed
systems don't require (and do not use) user agents. Consider a
multi-tier system with web server, sql server and some applications
server. These are not suitable to DMTCP's mode or work.

(This is not to say DMTCP isn't useful - it's a clever piece of
software with specific goals and more geared towards HPC needs).

Now regarding "standalone" c/r, if you want to save/restore single
or a subset of processes of a system without the rest of it, then
you will always need user agents, regardless of userspace/kernel
method. Likewise, their work on those tools will be as useful
independently of which c/r 'engine' it uses.

When you include all the relevant processes (e.g. an entire VNC
session, a web server, HPC and batch jobs), you generally don't
need the user agents. The checkpoint is self-contained, and linux-cr
can provide you that guarantee at checkpoint time.

> collecting and applying application-specific workarounds.  I've only
> glanced the dmtcp paper so my understanding is pretty superficial.
> With that in mind, can you please answer some of my curiosities?
>
> * As Oren pointed out in another message, there are somethings which
>   could seem a bit too visible to the target application.  Like the
>   manager thread (is it visible to the application or is it hidden by
>   the libc wrapper?) and reserved signal.  Also, while it's true that
>   all programs should be ready to handle -EINTR failure from system
>   calls, it's something which is very difficult to verify and test and
>   could lead to once-in-a-blue-moon head scratchy kind of failures.

If there is a will, there is (almost always) a way ;)

What MTCP does, IIUC, is wrap around the applications with a complete
pid-namespace (and more) in userspace. There are/were also commercial
products that do that. It's a tremendous effort and I'm impressed by
their (MTCP) work so far.

It is important to understand that it has a price tag: performance
and complexity. It's usually useful for HPC needs, but unsuitable
for the generic server/VPS space.

>
>   I think most of those issues can be tackled with minor narrow-scoped
>   changes to the kernel.  Do you guys have things on mind which the
>   kernel can do to make these things more transparent or safer?

Hmmm... the kernel already does much of it - for instance, we have
neat pid-namespace infrastructure; does it make sense to go into
the trouble of adding interfaces to provide for pid-virtalization
in userspace ?  we should be past that ...

Moreover, your objection was based on the apparent complexity of
a badly presented aggregate diff (and I disagree: most of that
are simple refactoring and cleanups). However, that very set of
"narrow-scoped changes" to the kernel that you suggest, will take
life in the form of kernel patches that will do more than these
and will achieve less.

> * The feats dmtcp achieves with its set of workarounds are impressive
>   but at the same time look quite hairy.  Christoph said that having a
>   standard userland C-R implementation would be quite useful and IMHO
>   it would be helpful in that direction if the implementation is
>   modularized enough so that the core functionality and the set of
>   workarounds can be easily separated.  Is it already so?

 From what I understand, the 'wrapper' functionality to support
distributed operation is said to be well modularized from the
actual c/r engine - which will allow it to use better c/r engines;
and coincidentally, I have one in mind... ;)

Oren.

  parent reply	other threads:[~2010-11-05 22:22 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.64.1011021530470.12128@takamine.ncl.cs.columbia.edu>
2010-11-02 21:35 ` [Ksummit-2010-discuss] checkpoint-restart: naked patch Tejun Heo
2010-11-02 21:47   ` Christoph Hellwig
2010-11-04  1:47     ` Nathan Lynch
2010-11-04  7:36       ` Tejun Heo
2010-11-04 16:04         ` Gene Cooperman
2010-11-04 20:45         ` Nathan Lynch
2010-11-06  6:48           ` Matt Helsley
2010-11-04  4:34     ` Oren Laadan
2010-11-04 14:25       ` Christoph Hellwig
2010-11-04  3:40   ` Kapil Arya
2010-11-04  8:05     ` Tejun Heo
2010-11-04 16:44       ` Gene Cooperman
2010-11-05  9:28         ` Tejun Heo
2010-11-05 23:18           ` Oren Laadan
2010-11-06 10:13             ` Tejun Heo
2010-11-06  0:36           ` Kapil Arya
2010-11-06 22:55             ` Oren Laadan
2010-11-07 19:42               ` Gene Cooperman
2010-11-07 21:30                 ` Oren Laadan
2010-11-07 23:05                   ` Gene Cooperman
2010-11-08  3:55                     ` Oren Laadan
2010-11-08 16:26                       ` Gene Cooperman
2010-11-08 18:14                         ` Oren Laadan
2010-11-08 18:37                           ` Gene Cooperman
2010-11-08 19:34                             ` Oren Laadan
2010-11-08 19:05                         ` Dan Smith
2010-11-17 11:14                           ` Tejun Heo
2010-11-17 15:33                             ` Dan Smith
2010-11-17 15:40                               ` Tejun Heo
2010-11-17 17:04                                 ` Alexey Dobriyan
2010-11-17 10:45             ` Tejun Heo
2010-11-17 12:12               ` Tejun Heo
2010-11-06  5:32           ` Matt Helsley
2010-11-06 15:01             ` Oren Laadan
2010-11-06 20:40             ` Gene Cooperman
2010-11-06 22:41               ` Oren Laadan
2010-11-07 18:49                 ` Gene Cooperman
     [not found]                   ` <20101107184927.GF31077-Rl5vdzG4YPwx/1z6v04GWfZ8FUJU4vz8@public.gmane.org>
2010-11-07 21:59                     ` Oren Laadan
2010-11-07 21:59                       ` Oren Laadan
2010-11-17 11:57                       ` Tejun Heo
2010-11-17 15:39                         ` Serge E. Hallyn
2010-11-17 15:46                           ` Tejun Heo
2010-11-18  9:13                             ` Pavel Emelyanov
     [not found]                               ` <4CE4EE21.6050305-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2010-11-18  9:48                                 ` Tejun Heo
2010-11-18  9:48                                   ` Tejun Heo
2010-11-18 20:13                                   ` Jose R. Santos
2010-11-19  3:54                                   ` Serge Hallyn
2010-11-18 19:53                             ` Oren Laadan
2010-11-19  4:10                             ` Serge Hallyn
2010-11-19 14:04                               ` Tejun Heo
     [not found]                                 ` <4CE683E1.6010500-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2010-11-19 14:36                                   ` Kirill Korotaev
2010-11-19 14:36                                     ` Kirill Korotaev
     [not found]                                     ` <04F4899E-B5C7-4BAF-8F2F-05D507A91408-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2010-11-19 15:33                                       ` Tejun Heo
2010-11-19 15:33                                         ` Tejun Heo
2010-11-19 16:00                                         ` Alexey Dobriyan
2010-11-19 16:01                                           ` Alexey Dobriyan
2010-11-19 16:10                                             ` Tejun Heo
2010-11-19 16:25                                               ` Alexey Dobriyan
2010-11-19 16:06                                           ` Tejun Heo
2010-11-19 16:16                                             ` Alexey Dobriyan
2010-11-19 16:19                                               ` Tejun Heo
2010-11-19 16:27                                                 ` Alexey Dobriyan
     [not found]                                                   ` <AANLkTin7kd3crS+fTLLea5PhAii7B3dz=n7p7YtQ6d4g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-19 16:32                                                     ` Tejun Heo
2010-11-19 16:32                                                       ` Tejun Heo
2010-11-19 16:38                                                       ` Alexey Dobriyan
2010-11-19 16:50                                                         ` Tejun Heo
2010-11-19 16:50                                                           ` Tejun Heo
2010-11-19 16:55                                                           ` Alexey Dobriyan
2010-11-20 17:58                                         ` Oren Laadan
2010-11-20 18:08                                   ` Oren Laadan
2010-11-20 18:08                                     ` Oren Laadan
2010-11-20 18:11                                   ` Oren Laadan
2010-11-20 18:11                                     ` Oren Laadan
     [not found]                                     ` <4CE69B8C.6050606-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-11-20 18:15                                       ` Oren Laadan
2010-11-20 18:15                                         ` Oren Laadan
2010-11-20 19:33                                         ` Tejun Heo
2010-11-21  8:18                                           ` Gene Cooperman
2010-11-21  8:18                                             ` Gene Cooperman
2010-11-21  8:21                                             ` Gene Cooperman
2010-11-22 18:02                                               ` Sukadev Bhattiprolu
2010-11-23 17:53                                               ` Oren Laadan
2010-11-24  3:50                                                 ` Kapil Arya
2010-11-25 16:04                                                   ` Oren Laadan
2010-11-29  4:09                                                     ` Gene Cooperman
2010-11-21 22:41                                             ` Grant Likely
2010-11-22 17:34                                             ` Oren Laadan
2010-11-22 17:18                                           ` Oren Laadan
2010-11-20 18:05                                 ` Oren Laadan
2010-11-17 22:17                         ` Matt Helsley
     [not found]                           ` <20101117221713.GA27736-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-11-18 10:06                             ` Tejun Heo
2010-11-18 10:06                               ` Tejun Heo
2010-11-18 20:25                             ` Oren Laadan
2010-11-18 20:25                               ` Oren Laadan
2010-11-07 21:44               ` Oren Laadan
2010-11-07 23:31                 ` Gene Cooperman
2010-11-05 22:24       ` Oren Laadan [this message]
2010-11-04  4:03   ` Oren Laadan
2010-11-04  9:43     ` Tejun Heo
2010-11-04 12:48       ` Luck, Tony
2010-11-04 13:06         ` Tejun Heo
2010-11-06 10:12       ` Matt Helsley
2010-11-06 11:03         ` Tejun Heo
2010-11-07 22:59         ` Davide Libenzi
2010-11-08  2:32           ` david
2010-11-18 20:41             ` Oren Laadan
2010-11-05  3:55     ` Kapil Arya
2010-11-05 11:57       ` Luck, Tony
2010-11-05 17:17         ` Gene Cooperman
2010-11-06  1:16           ` Matt Helsley
2010-11-06  4:06             ` Oren Laadan
2010-11-06  5:18               ` Matt Helsley
2010-11-06 21:00           ` Oren Laadan
2010-11-05 17:31       ` Sukadev Bhattiprolu
2010-11-06 21:05       ` Oren Laadan
2010-11-08 16:55 ` Grant Likely
2010-11-08 21:01   ` Nathan Lynch
2010-11-11  6:27   ` Nathan Lynch
2010-11-17  5:29   ` Anton Blanchard
2010-11-17 11:08     ` Tejun Heo
2010-11-18  9:53     ` Alan Cox
2010-11-18 12:27       ` Alexey Dobriyan
2010-11-19  6:33     ` Gene Cooperman
2010-11-21 23:20     ` Grant Likely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CD4842A.5050009@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=gene@ccs.neu.edu \
    --cc=hch@lst.de \
    --cc=kapil@ccs.neu.edu \
    --cc=ksummit-2010-discuss@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.