From: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: containers <containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>,
Dave Hansen
<dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: Re: [BIG RFC] Filesystem-based checkpoint
Date: Thu, 30 Oct 2008 14:28:17 -0500 [thread overview]
Message-ID: <20081030192817.GA16340@us.ibm.com> (raw)
In-Reply-To: <4909FAA8.5000107-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>
> I'm not sure why you say it's "un-linux-y" to begin with. But to the
The thing that is un-linux-y is specifically having user-space pass an
fd to the kernel from which it reads/writes. LSMs had to go to a lot of
pain to avoid doing that for reading policy configuration at boot.
Of course it's now several years later, and moods and tastes change in
the kernel community, but I suspect it's still frowned upon.
> point, here are my thought:
>
>
> 1. What you suggest is to expose the internal data to user space and
> pull it. Isn't that what cryo tried to do ? And the conclusion was
> that it takes too many interfaces to work out, code in, provide, and
> maintain forever, with issues related to backward compatibility and
> what not. In fact, the conclusion was "let's do a kernel-blob" !
Right, the problem with cryo was that it tried to do the checkpoint and
restart themselves at too fine-grained a level in terms of kernel-user
API.
What Dave is suggesting (as I understand it) is just changing the way
the data is shipped between kernel and user-space. But to continue with
sys_checkpoint() and sys_restart(). So I think it's a less fundamental
change than you are thinking.
Now maybe eventually he's going to propose something more esotaric where
doing the mount() actually starts the checkpoint (that's where I figured
he'd be heading), but I think it would still be one action on the part
of userspace telling the kernel "do a checkpoint".
(Or am I wrong on that, Dave?)
[...]
(I'll let Dave respond to your other questions i.e. about what you gain)
> If this is only to be able to parallelize checkpoint - then let's discuss
> the problem, not a specific solution.
The specific problem is that you have userspace pass a file fd to the
kernel and kernel reading/writing to it, which is un-linuxy.
> > It enables us to do all the I/O from userspace: no in-kernel
> > sys_read/write().
>
> What's so wrong with in-kernel vfs_read/write() ? You mentioned deadlocks,
It's un-linux-y :)
[...]
> 5. Your suggestions leaves too many details out. Yes, it's a call for
> discussion. But still. Zap, OpenVZ and other systems build on experience
> and working code. We know how to do incremental, live, and other goodies.
> I'm not sure how these would work with your scheme.
Not sure what problems you envision, but taking the specific example of
pre-dump to prepare for a quick live migration, I could envision a
pre_checkpoint() system call creating the checkpoint data directory
and starting to dump out the data, and starting to copy that data
over the network (optimistically), after which the do_checkpoint()
syscall checks file timestamps and quickly dumps and network-copies the
data which has changed up until the container was frozen.
-serge
next prev parent reply other threads:[~2008-10-30 19:28 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-28 18:37 [BIG RFC] Filesystem-based checkpoint Dave Hansen
2008-10-28 20:56 ` Serge E. Hallyn
[not found] ` <20081028205654.GA17487-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-28 21:00 ` Dave Hansen
2008-10-28 21:10 ` Dave Hansen
2008-10-30 16:25 ` Oren Laadan
[not found] ` <4909E000.9070201-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 16:36 ` Dave Hansen
2008-10-30 18:19 ` Oren Laadan
[not found] ` <4909FAA8.5000107-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 19:28 ` Serge E. Hallyn [this message]
[not found] ` <20081030192817.GA16340-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-30 19:39 ` Dave Hansen
2008-10-30 19:50 ` Serge E. Hallyn
2008-10-30 19:47 ` Oren Laadan
[not found] ` <490A0F67.5000303-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 20:03 ` Serge E. Hallyn
2008-10-30 20:11 ` Dave Hansen
2008-11-04 21:33 ` Mike Waychison
2008-10-30 19:37 ` Dave Hansen
2008-10-30 20:15 ` Oren Laadan
[not found] ` <490A15F5.6010702-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 20:40 ` Dave Hansen
2008-10-30 23:33 ` Eric W. Biederman
[not found] ` <m163n9y7yb.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-10-31 0:09 ` Dave Hansen
2008-10-31 3:12 ` Eric W. Biederman
[not found] ` <m1k5bpwj8j.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-10-31 10:22 ` Louis Rilling
2008-10-31 13:48 ` Serge E. Hallyn
2008-10-31 14:21 ` Dave Hansen
2008-10-31 20:51 ` Eric W. Biederman
[not found] ` <m1r65wpjx2.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-11-03 17:23 ` Dave Hansen
2008-11-03 17:48 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081030192817.GA16340@us.ibm.com \
--to=serue-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
--cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
--cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
--cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.