From: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: containers <containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>,
Dave Hansen
<dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: Re: [BIG RFC] Filesystem-based checkpoint
Date: Thu, 30 Oct 2008 14:28:17 -0500 [thread overview]
Message-ID: <20081030192817.GA16340@us.ibm.com> (raw)
In-Reply-To: <4909FAA8.5000107-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>
> I'm not sure why you say it's "un-linux-y" to begin with. But to the
The thing that is un-linux-y is specifically having user-space pass an
fd to the kernel from which it reads/writes. LSMs had to go to a lot of
pain to avoid doing that for reading policy configuration at boot.
Of course it's now several years later, and moods and tastes change in
the kernel community, but I suspect it's still frowned upon.
> point, here are my thought:
>
>
> 1. What you suggest is to expose the internal data to user space and
> pull it. Isn't that what cryo tried to do ? And the conclusion was
> that it takes too many interfaces to work out, code in, provide, and
> maintain forever, with issues related to backward compatibility and
> what not. In fact, the conclusion was "let's do a kernel-blob" !
Right, the problem with cryo was that it tried to do the checkpoint and
restart themselves at too fine-grained a level in terms of kernel-user
API.
What Dave is suggesting (as I understand it) is just changing the way
the data is shipped between kernel and user-space. But to continue with
sys_checkpoint() and sys_restart(). So I think it's a less fundamental
change than you are thinking.
Now maybe eventually he's going to propose something more esotaric where
doing the mount() actually starts the checkpoint (that's where I figured
he'd be heading), but I think it would still be one action on the part
of userspace telling the kernel "do a checkpoint".
(Or am I wrong on that, Dave?)
[...]
(I'll let Dave respond to your other questions i.e. about what you gain)
> If this is only to be able to parallelize checkpoint - then let's discuss
> the problem, not a specific solution.
The specific problem is that you have userspace pass a file fd to the
kernel and kernel reading/writing to it, which is un-linuxy.
> > It enables us to do all the I/O from userspace: no in-kernel
> > sys_read/write().
>
> What's so wrong with in-kernel vfs_read/write() ? You mentioned deadlocks,
It's un-linux-y :)
[...]
> 5. Your suggestions leaves too many details out. Yes, it's a call for
> discussion. But still. Zap, OpenVZ and other systems build on experience
> and working code. We know how to do incremental, live, and other goodies.
> I'm not sure how these would work with your scheme.
Not sure what problems you envision, but taking the specific example of
pre-dump to prepare for a quick live migration, I could envision a
pre_checkpoint() system call creating the checkpoint data directory
and starting to dump out the data, and starting to copy that data
over the network (optimistically), after which the do_checkpoint()
syscall checks file timestamps and quickly dumps and network-copies the
data which has changed up until the container was frozen.
-serge
next prev parent reply other threads:[~2008-10-30 19:28 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-28 18:37 [BIG RFC] Filesystem-based checkpoint Dave Hansen
2008-10-28 20:56 ` Serge E. Hallyn
[not found] ` <20081028205654.GA17487-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-28 21:00 ` Dave Hansen
2008-10-28 21:10 ` Dave Hansen
2008-10-30 16:25 ` Oren Laadan
[not found] ` <4909E000.9070201-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 16:36 ` Dave Hansen
2008-10-30 18:19 ` Oren Laadan
[not found] ` <4909FAA8.5000107-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 19:28 ` Serge E. Hallyn [this message]
[not found] ` <20081030192817.GA16340-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-30 19:39 ` Dave Hansen
2008-10-30 19:50 ` Serge E. Hallyn
2008-10-30 19:47 ` Oren Laadan
[not found] ` <490A0F67.5000303-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 20:03 ` Serge E. Hallyn
2008-10-30 20:11 ` Dave Hansen
2008-11-04 21:33 ` Mike Waychison
2008-10-30 19:37 ` Dave Hansen
2008-10-30 20:15 ` Oren Laadan
[not found] ` <490A15F5.6010702-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 20:40 ` Dave Hansen
2008-10-30 23:33 ` Eric W. Biederman
[not found] ` <m163n9y7yb.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-10-31 0:09 ` Dave Hansen
2008-10-31 3:12 ` Eric W. Biederman
[not found] ` <m1k5bpwj8j.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-10-31 10:22 ` Louis Rilling
2008-10-31 13:48 ` Serge E. Hallyn
2008-10-31 14:21 ` Dave Hansen
2008-10-31 20:51 ` Eric W. Biederman
[not found] ` <m1r65wpjx2.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-11-03 17:23 ` Dave Hansen
2008-11-03 17:48 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081030192817.GA16340@us.ibm.com \
--to=serue-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
--cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
--cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
--cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox