Linux Container Development
 help / color / mirror / Atom feed
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
To: Dave Hansen <dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: containers <containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>
Subject: Re: [BIG RFC] Filesystem-based checkpoint
Date: Thu, 30 Oct 2008 16:33:16 -0700	[thread overview]
Message-ID: <m163n9y7yb.fsf@frodo.ebiederm.org> (raw)
In-Reply-To: <1225219047.12673.182.camel@nimitz> (Dave Hansen's message of "Tue, 28 Oct 2008 11:37:27 -0700")

Dave Hansen <dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

> I hate the syscall.  It's a very un-Linux-y way of doing things.  There,
> I said it.  Here's an alternative.  It still uses the syscall to
> initiate things, but it uses debugfs to transport the data instead.
> This is just a concept demonstration.  It doesn't actually work, and I
> wouldn't be using debugfs in practice.

A syscall is a very linux-y way to do it.

If you called it a core dump instead of a checkpoint you have exactly the same set
of issues.

Why we are doing vfs_write instead of file->f_op->write I don't understand.

> System calls in Linux are fast.  Doing lots of them is not a problem.
> If it becomes one, we can always export a condensed version of this
> format next to the expanded one, kinda like ftrace does.  Atomicity with
> this approach is also not a problem.  The system call in this approach
> doesn't return until the checkpoint is completely written out.

Extra copies for something (memory) you want to transfer quickly
and efficiently is a problem.

Reading the memory of another process is a problem, to the point
that the /proc/<pid>/mem interface has been removed from the kernel.
  
> This lets userspace pick and choose what parts of the checkpoint it
> cares about.  It enables us to do all the I/O from userspace: no
> in-kernel sys_read/write().  I think this interface is much more
> flexible than a plain syscall.

Then get with Roland McGraff and build the next generation user
space debugging interface.

> Want to do a fast checkpoint?  Fine, copy all data, use a lot of memory,
> store it in-kernel.  Dump that out when the filesystem is accessed.
> Destroy it when userspace asks.

> So, why not?

Besides the part of creating a bunch of questionable interfaces
that we need to support forever.

Ultimately the question is how do you do checkpoint restore and I just
don't see that happening with a filesystem interface.  Way way way too many
dangerous syscalls that are only needed for one thing.

Checkpoint/Restore are an atomic operation, and filesystems suck and building
high level atomic primitives.

Eric

  parent reply	other threads:[~2008-10-30 23:33 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-28 18:37 [BIG RFC] Filesystem-based checkpoint Dave Hansen
2008-10-28 20:56 ` Serge E. Hallyn
     [not found]   ` <20081028205654.GA17487-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-28 21:00     ` Dave Hansen
2008-10-28 21:10     ` Dave Hansen
2008-10-30 16:25       ` Oren Laadan
     [not found]         ` <4909E000.9070201-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 16:36           ` Dave Hansen
2008-10-30 18:19 ` Oren Laadan
     [not found]   ` <4909FAA8.5000107-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 19:28     ` Serge E. Hallyn
     [not found]       ` <20081030192817.GA16340-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-30 19:39         ` Dave Hansen
2008-10-30 19:50           ` Serge E. Hallyn
2008-10-30 19:47         ` Oren Laadan
     [not found]           ` <490A0F67.5000303-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 20:03             ` Serge E. Hallyn
2008-10-30 20:11             ` Dave Hansen
2008-11-04 21:33               ` Mike Waychison
2008-10-30 19:37     ` Dave Hansen
2008-10-30 20:15       ` Oren Laadan
     [not found]         ` <490A15F5.6010702-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30 20:40           ` Dave Hansen
2008-10-30 23:33 ` Eric W. Biederman [this message]
     [not found]   ` <m163n9y7yb.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-10-31  0:09     ` Dave Hansen
2008-10-31  3:12       ` Eric W. Biederman
     [not found]         ` <m1k5bpwj8j.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-10-31 10:22           ` Louis Rilling
2008-10-31 13:48           ` Serge E. Hallyn
2008-10-31 14:21           ` Dave Hansen
2008-10-31 20:51             ` Eric W. Biederman
     [not found]               ` <m1r65wpjx2.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-11-03 17:23                 ` Dave Hansen
2008-11-03 17:48                   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m163n9y7yb.fsf@frodo.ebiederm.org \
    --to=ebiederm-as9lmozglivwk0htik3j/w@public.gmane.org \
    --cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
    --cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox