public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Kirill Korotaev <dev@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>,
	Kapil Arya <kapil@ccs.neu.edu>, Gene Cooperman <gene@ccs.neu.edu>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Pavel Emelianov <xemul@parallels.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Linux Containers <containers@lists.osdl.org>
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch
Date: Fri, 19 Nov 2010 16:33:25 +0100	[thread overview]
Message-ID: <4CE698C5.5060806@kernel.org> (raw)
In-Reply-To: <04F4899E-B5C7-4BAF-8F2F-05D507A91408@parallels.com>

Hello,

On 11/19/2010 03:36 PM, Kirill Korotaev wrote:
> Can you imagine how many userland APIs are needed to make userspace C/R?
> 
> Do you really want APIs in user-space which allow to:
> - send signals with siginfo attached (kill() doesn't work...)

Doesn't rt_sigqueueinfo() already do this?

> - read inotify configuration

This would be nice even apart from CR.

> - insert SKB's into socket buffers

Can't we drain kernel buffers?  ie. Stop further writing and wait the
send-q to drop to zero.

> - setup all TCP/IP parameters for sockets

I _think_ most can be restored by talking to netfilter module.
Setting outgoing sequence number might be beneficial tho.

> - wait for AIO pending in other processes

I haven't looked at aio implementation for a while now but can't we
drain these upon checkpointing and just carry the completion status?
Also, if aio is what you're concerned about, I would say the problem
is mostly solved.

> - setting different statistics counters (like netdev stats etc.)
> and so on...

Why would this matter?

> For every small piece of functionality you will need to export ABI
> and maintain it forever.  It's thousands of APIs! And why the hell
> they are needed in user space at all?

I think it's actually quite the contrary.  Most things are already
visible to userland.  They _have_ to be and that's the reason why
userland implementation can already get most things working without
any change to the kernel with some amount of hackery.  To me in-kernel
CR seems to approach the problem from the exactly wrong direction -
rather than dealing with specific exceptions, it create a completely
new framework which is very foreign and not useful outside of CR.

Also, think about it.  Which one is better?  A kernel which can fully
show its ABI visible states to userland or one which dumps its
internal data structurs in binary blobs.  To me, the latter seems
multiple orders of magnitude uglier.

> BTW, HPC case you are talking about is probably the simplest
> one.

Yet, it is one of the the most important / relevant use cases.

> Last time I looked into it, IBM Meiosis c/r didn't even bother with
> tty's migration.  In OpenVZ we really do need much more then that
> like autofs/NFS support, preserve statistics, TTYs, etc. etc. etc.

Would it be impossible to preserve autofs/NFS and TTYs from userland?
Then, why so?  For statistics, I'm a bit lost.  Why does it matter and
even if it does would it justify putting the whole CR inside kernel?

Thank you.

-- 
tejun

  reply	other threads:[~2010-11-19 15:34 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.64.1011021530470.12128@takamine.ncl.cs.columbia.edu>
2010-11-02 21:35 ` [Ksummit-2010-discuss] checkpoint-restart: naked patch Tejun Heo
2010-11-02 21:47   ` Christoph Hellwig
2010-11-04  1:47     ` Nathan Lynch
2010-11-04  7:36       ` Tejun Heo
2010-11-04 16:04         ` Gene Cooperman
2010-11-04 20:45         ` Nathan Lynch
2010-11-06  6:48           ` Matt Helsley
2010-11-04  4:34     ` Oren Laadan
2010-11-04 14:25       ` Christoph Hellwig
2010-11-04  3:40   ` Kapil Arya
2010-11-04  8:05     ` Tejun Heo
2010-11-04 16:44       ` Gene Cooperman
2010-11-05  9:28         ` Tejun Heo
2010-11-05 23:18           ` Oren Laadan
2010-11-06 10:13             ` Tejun Heo
2010-11-06  0:36           ` Kapil Arya
2010-11-06 22:55             ` Oren Laadan
2010-11-07 19:42               ` Gene Cooperman
2010-11-07 21:30                 ` Oren Laadan
2010-11-07 23:05                   ` Gene Cooperman
2010-11-08  3:55                     ` Oren Laadan
2010-11-08 16:26                       ` Gene Cooperman
2010-11-08 18:14                         ` Oren Laadan
2010-11-08 18:37                           ` Gene Cooperman
2010-11-08 19:34                             ` Oren Laadan
2010-11-08 19:05                         ` Dan Smith
2010-11-17 11:14                           ` Tejun Heo
2010-11-17 15:33                             ` Dan Smith
2010-11-17 15:40                               ` Tejun Heo
2010-11-17 17:04                                 ` Alexey Dobriyan
2010-11-17 10:45             ` Tejun Heo
2010-11-17 12:12               ` Tejun Heo
2010-11-06  5:32           ` Matt Helsley
2010-11-06 15:01             ` Oren Laadan
2010-11-06 20:40             ` Gene Cooperman
2010-11-06 22:41               ` Oren Laadan
2010-11-07 18:49                 ` Gene Cooperman
2010-11-07 21:59                   ` Oren Laadan
2010-11-17 11:57                     ` Tejun Heo
2010-11-17 15:39                       ` Serge E. Hallyn
2010-11-17 15:46                         ` Tejun Heo
2010-11-18  9:13                           ` Pavel Emelyanov
2010-11-18  9:48                             ` Tejun Heo
2010-11-18 20:13                               ` Jose R. Santos
2010-11-19  3:54                               ` Serge Hallyn
2010-11-18 19:53                           ` Oren Laadan
2010-11-19  4:10                           ` Serge Hallyn
2010-11-19 14:04                             ` Tejun Heo
2010-11-19 14:36                               ` Kirill Korotaev
2010-11-19 15:33                                 ` Tejun Heo [this message]
2010-11-19 16:00                                   ` Alexey Dobriyan
2010-11-19 16:01                                     ` Alexey Dobriyan
2010-11-19 16:10                                       ` Tejun Heo
2010-11-19 16:25                                         ` Alexey Dobriyan
2010-11-19 16:06                                     ` Tejun Heo
2010-11-19 16:16                                       ` Alexey Dobriyan
2010-11-19 16:19                                         ` Tejun Heo
2010-11-19 16:27                                           ` Alexey Dobriyan
2010-11-19 16:32                                             ` Tejun Heo
2010-11-19 16:38                                               ` Alexey Dobriyan
2010-11-19 16:50                                                 ` Tejun Heo
2010-11-19 16:55                                                   ` Alexey Dobriyan
2010-11-20 17:58                                   ` Oren Laadan
2010-11-20 18:05                               ` Oren Laadan
2010-11-20 18:08                               ` Oren Laadan
2010-11-20 18:11                               ` Oren Laadan
2010-11-20 18:15                                 ` Oren Laadan
2010-11-20 19:33                                   ` Tejun Heo
2010-11-21  8:18                                     ` Gene Cooperman
2010-11-21  8:21                                       ` Gene Cooperman
2010-11-22 18:02                                         ` Sukadev Bhattiprolu
2010-11-23 17:53                                         ` Oren Laadan
2010-11-24  3:50                                           ` Kapil Arya
2010-11-25 16:04                                             ` Oren Laadan
2010-11-29  4:09                                               ` Gene Cooperman
2010-11-21 22:41                                       ` Grant Likely
2010-11-22 17:34                                       ` Oren Laadan
2010-11-22 17:18                                     ` Oren Laadan
2010-11-17 22:17                       ` Matt Helsley
2010-11-18 10:06                         ` Tejun Heo
2010-11-18 20:25                         ` Oren Laadan
2010-11-07 21:44               ` Oren Laadan
2010-11-07 23:31                 ` Gene Cooperman
2010-11-05 22:24       ` Oren Laadan
2010-11-04  4:03   ` Oren Laadan
2010-11-04  9:43     ` Tejun Heo
2010-11-04 12:48       ` Luck, Tony
2010-11-04 13:06         ` Tejun Heo
2010-11-06 10:12       ` Matt Helsley
2010-11-06 11:03         ` Tejun Heo
2010-11-07 22:59         ` Davide Libenzi
2010-11-08  2:32           ` david
2010-11-18 20:41             ` Oren Laadan
2010-11-05  3:55     ` Kapil Arya
2010-11-05 11:57       ` Luck, Tony
2010-11-05 17:17         ` Gene Cooperman
2010-11-06  1:16           ` Matt Helsley
2010-11-06  4:06             ` Oren Laadan
2010-11-06  5:18               ` Matt Helsley
2010-11-06 21:00           ` Oren Laadan
2010-11-05 17:31       ` Sukadev Bhattiprolu
2010-11-06 21:05       ` Oren Laadan
2010-11-08 16:55 ` Grant Likely
2010-11-08 21:01   ` Nathan Lynch
2010-11-11  6:27   ` Nathan Lynch
2010-11-17  5:29   ` Anton Blanchard
2010-11-17 11:08     ` Tejun Heo
2010-11-18  9:53     ` Alan Cox
2010-11-18 12:27       ` Alexey Dobriyan
2010-11-19  6:33     ` Gene Cooperman
2010-11-21 23:20     ` Grant Likely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE698C5.5060806@kernel.org \
    --to=tj@kernel.org \
    --cc=containers@lists.osdl.org \
    --cc=dev@parallels.com \
    --cc=ebiederm@xmission.com \
    --cc=gene@ccs.neu.edu \
    --cc=kapil@ccs.neu.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=serge.hallyn@canonical.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox