All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serue@us.ibm.com>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: dave@linux.vnet.ibm.com, containers@lists.linux-foundation.org,
	jeremy@goop.org, linux-kernel@vger.kernel.org, arnd@arndb.de,
	Andrey Mirkin <major@openvz.org>
Subject: Re: [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart
Date: Thu, 4 Sep 2008 17:03:36 -0500	[thread overview]
Message-ID: <20080904220336.GA4528@us.ibm.com> (raw)
In-Reply-To: <48C04D7C.6020500@cs.columbia.edu>

Quoting Oren Laadan (orenl@cs.columbia.edu):
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl@cs.columbia.edu):
> >>
> >> Serge E. Hallyn wrote:
> >>> Quoting Oren Laadan (orenl@cs.columbia.edu):
> >>>> Create trivial sys_checkpoint and sys_restore system calls. They will
> >>>> enable to checkpoint and restart an entire container, to and from a
> >>>> checkpoint image file descriptor.
> >>>>
> >>>> The syscalls take a file descriptor (for the image file) and flags as
> >>>> arguments. For sys_checkpoint the first argument identifies the target
> >>>> container; for sys_restart it will identify the checkpoint image.
> >>>>
> >>>> Signed-off-by: Oren Laadan <orenl@cs.columbia.edu>
> >>>> ---
> >> [...]
> >>
> >>>> +/**
> >>>> + * sys_checkpoint - checkpoint a container
> >>>> + * @pid: pid of the container init(1) process
> >>>> + * @fd: file to which dump the checkpoint image
> >>>> + * @flags: checkpoint operation flags
> >>>> + */
> >>>> +asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
> >>>> +{
> >>>> +	pr_debug("sys_checkpoint not implemented yet\n");
> >>>> +	return -ENOSYS;
> >>>> +}
> >>>> +/**
> >>>> + * sys_restart - restart a container
> >>>> + * @crid: checkpoint image identifier
> >>> So can we compare your api to Andrey's?
> >>>
> >>> You've explained before that crid is used to tie together multiple
> >>> calls to checkpoint, but why do you have to specify it for restart?
> >>> Can't it just come from the fd?  Or, the fd will be passed in
> >>> seek()d to the right position for the data for this task, so the crid
> >>> won't be available there?
> >> I added the 'crid' inside to support a mode of operation in which we
> >> would like the checkpoint data to remain in memory across multiple
> >> system calls. Here are example scenarios:
> >>
> >> 1) We will want to reduce down time by first buffering the checkpoint
> >> image in memory, then resuming the container, and only then writing
> >> the data back to a (the) file descriptor.
> >> So instead of:
> >>   freeze -> checkpoint and write back -> unfreeze
> >> We want:
> >>   freeze -> checkpoint to buffer -> unfreeze -> write back
> >> I envision each of these steps to be a separate invocation of a syscall.
> >> to the 'crid' returned by the sys_checkpoint() at the 2nd step, will be
> >> used to identify that data in the 4th step. (Note, that between the
> >> unfreeze and the write-back, another checkpoint may be already taken).
> >>
> >> 2) A task may want to take a checkpoint (e.g. of itself, or a whole
> >> container) and keep that checkpoint in memory; at a later time it may
> >> want to revert to that checkpoint. Moreover, it may keep multiple such
> >> checkpoints (to where it may want to return). 'crid' tells sys_restart
> >> which one to use.
> >>
> >> Note that this 'crid' will in fact be tied to resources that are kept
> >> by the kernel - e.g. references to COW pages (when we add that).
> >> Louis suggested to use a specialized FD instead of a numeric 'crid'
> >> (that is: create a anonymous inode and a struct file that represent
> >> that checkpoint in the kernel, and return an FD to it). This approach
> >> has pros and cons of 'crid' (see the archives of the containers
> >> mailing list). For now I kept 'crid', but I'm definitely open to change
> >> it to a FD.
> >>
> >> Oren.
> > 
> > Oh, so the crid identifies one checkpoint inside the file - the single
> > file can store multiple checkpoints?
> 
> Not quite. Let me rephrase the motivation first:
> 
> There are occasions when we would like to keep the checkpoint data in the
> kernel for some (relatively long) time, between syscalls. By "checkpoint
> data" I mean references to memory contents (pages) and all the other data.
> 
> The two scenarios above are two examples: between the syscall to checkpoint
> and the syscall to unfreeze and then write-back the data to a file (first
> example), and for some time until a task may want to "go back in time"
> (second example, useful for ultra fast "undo" for a task).
> 
> Note that in both cases when I say "keep in kernel" I mean before it is
> written to a file, or to the network. Simply in memory, in some efficient
> manner.
> 
> Subsequent syscalls will need to refer to a specific checkpoint data that
> is kept in memory - e.g. to write-back to a file-descriptor, or to clean
> up, or to restart from it. (At any single time a specific container may
> have multiple checkpoints associated with it - eg. because they have not
> yet been written back to storage but already taken).
> 
> Once the data is written back to a file descriptor, the in-kernel data can
> be discarded and cleaned-up.
> 
> The main reason why I want to keep the data in the kernel and not instead
> copy to user space, is efficiency: most of the checkpoint data is the memory
> footprint; by keeping the data in the kernel, one can merely keep a COW
> reference instead of a whole copy of everything (save space and copy time).
> 
> So, if we have keep data in kernel between syscalls, then we must have a
> way to refer to it. The current implementation uses a very simple 'crid'
> value to do that - although, clearly, at the moment it isn't used.
> 
> I hope this explains better.

Ah, ok.  So we're either using an fd or a crid.

Personally I'd then prefer two syscalls, which are wrappers around
a more flexible in-kernel api.  That way we can start with just
	sys_restart(int fd, long flags)
and add
	sys_restart_crid(int crid, long flags)
later.

-serge

  reply	other threads:[~2008-09-04 22:03 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-04  7:57 [RFC v3][PATCH 0/9] Kernel based checkpoint/restart Oren Laadan
2008-09-04  8:02 ` [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-09-04  8:37   ` Cedric Le Goater
     [not found]   ` <Pine.LNX.4.64.0809040401320.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04  8:37     ` Cedric Le Goater
2008-09-04 14:42     ` Serge E. Hallyn
2008-09-04 14:42   ` Serge E. Hallyn
2008-09-04 17:32     ` Oren Laadan
2008-09-04 20:37       ` Serge E. Hallyn
     [not found]         ` <20080904203730.GA28313-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-04 21:05           ` Oren Laadan
2008-09-04 21:05             ` Oren Laadan
2008-09-04 22:03             ` Serge E. Hallyn [this message]
     [not found]             ` <48C04D7C.6020500-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-04 22:03               ` Serge E. Hallyn
     [not found]       ` <48C01B92.60900-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-04 20:37         ` Serge E. Hallyn
     [not found]     ` <20080904144223.GA19364-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-04 17:32       ` Oren Laadan
2008-09-08 15:02       ` [Devel] " Andrey Mirkin
2008-09-08 15:02     ` Andrey Mirkin
     [not found]       ` <200809081902.33709.amirkin-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2008-09-08 16:07         ` Cedric Le Goater
2008-09-08 16:07           ` Cedric Le Goater
2008-09-04  8:02 ` [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart Oren Laadan
2008-09-04 16:03   ` Serge E. Hallyn
2008-09-04 16:09     ` Dave Hansen
     [not found]     ` <20080904160311.GC19364-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-04 16:09       ` Dave Hansen
     [not found]   ` <Pine.LNX.4.64.0809040402170.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04  9:12     ` Louis Rilling
2008-09-04  9:12       ` Louis Rilling
     [not found]       ` <20080904091230.GW14473-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-09-04 16:00         ` Serge E. Hallyn
2008-09-04 16:00           ` Serge E. Hallyn
2008-09-04 16:03     ` Serge E. Hallyn
2008-09-04  8:03 ` [RFC v3][PATCH 3/9] x86 support for checkpoint/restart Oren Laadan
2008-09-04  8:03 ` [RFC v3][PATCH 4/9] Memory management (dump) Oren Laadan
2008-09-04 18:25   ` Dave Hansen
2008-09-07  1:54     ` Oren Laadan
2008-09-07  1:54       ` Oren Laadan
2008-09-08 15:55       ` Dave Hansen
     [not found]       ` <48C3343D.9000407-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-08 15:55         ` Dave Hansen
     [not found]   ` <Pine.LNX.4.64.0809040403120.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04 18:25     ` Dave Hansen
2008-09-04  8:04 ` [RFC v3][PATCH 5/9] Memory managemnet (restore) Oren Laadan
     [not found]   ` <Pine.LNX.4.64.0809040404060.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04 18:08     ` Dave Hansen
2008-09-04 18:08   ` Dave Hansen
2008-09-07  3:09     ` Oren Laadan
2008-09-07  3:09     ` Oren Laadan
     [not found]       ` <48C345D2.1020603-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-08 16:49         ` Dave Hansen
2008-09-08 16:49       ` Dave Hansen
2008-09-09  6:01         ` Oren Laadan
2008-09-09  6:01         ` Oren Laadan
     [not found]           ` <48C6113A.3080804-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-10 21:42             ` Dave Hansen
2008-09-10 21:42           ` Dave Hansen
2008-09-10 22:00             ` Cleanups for: [PATCH " Dave Hansen
2008-09-10 22:00             ` Dave Hansen
2008-09-11  7:37             ` [RFC v3][PATCH " Oren Laadan
     [not found]               ` <48C8CAC6.3090209-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-11 15:38                 ` Serge E. Hallyn
2008-09-12 16:34                 ` Dave Hansen
2008-09-11 15:38               ` Serge E. Hallyn
2008-09-12 16:34               ` Dave Hansen
2008-09-11  7:37             ` Oren Laadan
2008-09-04  8:04 ` [RFC v3][PATCH 6/9] Checkpoint/restart: initial documentation Oren Laadan
2008-09-04  8:05 ` [RFC v3][PATCH 7/9] Infrastructure for shared objects Oren Laadan
2008-09-04  9:38   ` Louis Rilling
     [not found]     ` <20080904093803.GX14473-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-09-04 14:23       ` Oren Laadan
2008-09-04 14:23     ` Oren Laadan
     [not found]   ` <Pine.LNX.4.64.0809040404550.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04  9:38     ` Louis Rilling
2008-09-04 18:14     ` Dave Hansen
2008-09-04 18:14   ` Dave Hansen
2008-09-04  8:05 ` [RFC v3][PATCH 8/9] File descriprtors (dump) Oren Laadan
2008-09-04  9:47   ` Louis Rilling
     [not found]     ` <20080904094740.GY14473-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-09-04 14:43       ` Oren Laadan
2008-09-04 14:43     ` Oren Laadan
2008-09-04 15:01   ` Dave Hansen
     [not found]   ` <Pine.LNX.4.64.0809040405250.5982-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04  9:47     ` Louis Rilling
2008-09-04 15:01     ` Dave Hansen
2008-09-04 18:41     ` Dave Hansen
2008-09-04 18:41       ` Dave Hansen
2008-09-07  4:52       ` Oren Laadan
     [not found]         ` <48C35DFC.9080903-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-09-08 16:57           ` Dave Hansen
2008-09-08 16:57         ` Dave Hansen
2008-09-07  4:52       ` Oren Laadan
2008-09-04  8:06 ` [RFC v3][PATCH 9/9] File descriprtors (restore) Oren Laadan
     [not found] ` <Pine.LNX.4.64.0809040354440.460-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-09-04  8:02   ` [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-09-04  8:02   ` [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart Oren Laadan
2008-09-04  8:03   ` [RFC v3][PATCH 3/9] x86 support for checkpoint/restart Oren Laadan
2008-09-04  8:03   ` [RFC v3][PATCH 4/9] Memory management (dump) Oren Laadan
2008-09-04  8:04   ` [RFC v3][PATCH 5/9] Memory managemnet (restore) Oren Laadan
2008-09-04  8:04   ` [RFC v3][PATCH 6/9] Checkpoint/restart: initial documentation Oren Laadan
2008-09-04  8:05   ` [RFC v3][PATCH 7/9] Infrastructure for shared objects Oren Laadan
2008-09-04  8:05   ` [RFC v3][PATCH 8/9] File descriprtors (dump) Oren Laadan
2008-09-04  8:06   ` [RFC v3][PATCH 9/9] File descriprtors (restore) Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080904220336.GA4528@us.ibm.com \
    --to=serue@us.ibm.com \
    --cc=arnd@arndb.de \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=major@openvz.org \
    --cc=orenl@cs.columbia.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.