All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/10] container-based checkpoint/restart prototype
@ 2011-02-28 23:40 ` ntl
  0 siblings, 0 replies; 82+ messages in thread
From: ntl-e+AXbWqSrlAAvxtiuMwx3w @ 2011-02-28 23:40 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nathan Lynch

From: Nathan Lynch <ntl-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>

Checkpoint/restart is a facility by which one can save the state of a
job to a file and restart it later under the right conditions.  This
is a C/R prototype intended to illustrate how well (or poorly) it
would fit into the Linux kernel.  It is basically a fork of the
"linux-cr" patch set by Oren Laadan and others, but it is more limited
in scope and has a different system call interface.  I believe what I
have here is a decent starting point for a C/R implementation that can
go upstream, but I'm releasing early with the hope of receiving some
feedback/review on the overall approach before pursuing it too much
further.

The intended users are HPC, big homogeneous clusters, environments
with long-running jobs that are not easily interrupted without losing
work, for whatever reason (perhaps you've misplaced the source code
for your program and can't modify it to checkpoint and restore its own
state).  In these situations checkpoint/restart provides a rollback
mechanism to mitigate the effects of hardware/system failures as well
as a means of migrating jobs between nodes.


How it works:

Only a process with PID 1 ("init") can call checkpoint or restart.

Checkpoint freezes the rest of the pidns and goes about dumping the
state of all the other tasks in the PID namespace to the specificed
file descriptor.  The state of the caller is not recorded.

Before calling restart, init is expected to set up the environment
(mounts, net devices and such) in accord with the checkpointed job's
"expectations".  The restart system call recreates the task tree
(except for init itself) and the tasks resume execution; init can
then wait(2) for tasks to exit in the normal fashion.


Limitations:

This implementation is limited to containers by design (and this
prototype is limited to checkpoint/restore of a single simple task).
A Linux "container" doesn't have a universally agreed upon definition,
but in this context we are referring to a group of processes for which
the PID namespace (and possibly other namespaces) is isolated from the
rest of the system (see clone(2)).  This is the tradeoff we ask users
to make - the ability to C/R and migrate is provided in exchange for
accepting some isolation and slightly reduced ease of use.  A tool
such as lxc (http://lxc.sourceforge.net) can be used to isolate jobs.
A patch against lxc is available which adds C/R capability.

The user must ensure that a restarted job's view of the filesystem is
effectively the same as it was at the time of checkpoint.

Processes that map device memory and other such hardware-dependent
things will probably not be supported.


To do:

Multiple tasks
Signal state
System call restart blocks
More code cleanup/simplification
Other architecture support
System V IPC
Network/sockets
And much more


 Documentation/filesystems/vfs.txt  |   13 +-
 arch/x86/Kconfig                   |    4 +
 arch/x86/include/asm/checkpoint.h  |   17 +
 arch/x86/include/asm/elf.h         |    5 +
 arch/x86/include/asm/ldt.h         |    7 +
 arch/x86/include/asm/unistd_32.h   |    4 +-
 arch/x86/kernel/Makefile           |    2 +
 arch/x86/kernel/checkpoint.c       |  677 +++++++++++++++++++++++++++
 arch/x86/kernel/syscall_table_32.S |    2 +
 arch/x86/vdso/vdso32-setup.c       |   25 +-
 drivers/char/mem.c                 |    6 +
 drivers/char/random.c              |    6 +
 fs/Makefile                        |    1 +
 fs/aio.c                           |   27 ++
 fs/checkpoint.c                    |  695 +++++++++++++++++++++++++++
 fs/exec.c                          |    2 +-
 fs/ext2/dir.c                      |    3 +
 fs/ext2/file.c                     |    6 +
 fs/ext3/dir.c                      |    3 +
 fs/ext3/file.c                     |    3 +
 fs/ext4/dir.c                      |    3 +
 fs/ext4/file.c                     |    6 +
 fs/fcntl.c                         |   21 +-
 fs/locks.c                         |   35 ++
 include/linux/aio.h                |    2 +
 include/linux/checkpoint.h         |  347 ++++++++++++++
 include/linux/fs.h                 |   15 +
 include/linux/magic.h              |    3 +
 include/linux/mm.h                 |   15 +
 init/Kconfig                       |    2 +
 kernel/Makefile                    |    1 +
 kernel/checkpoint/Kconfig          |   15 +
 kernel/checkpoint/Makefile         |    9 +
 kernel/checkpoint/checkpoint.c     |  437 +++++++++++++++++
 kernel/checkpoint/objhash.c        |  368 +++++++++++++++
 kernel/checkpoint/restart.c        |  651 ++++++++++++++++++++++++++
 kernel/checkpoint/sys.c            |  208 +++++++++
 kernel/sys_ni.c                    |    4 +
 mm/Makefile                        |    1 +
 mm/checkpoint.c                    |  906 ++++++++++++++++++++++++++++++++++++
 mm/filemap.c                       |    4 +
 mm/mmap.c                          |    3 +
 42 files changed, 4549 insertions(+), 15 deletions(-)
 create mode 100644 arch/x86/include/asm/checkpoint.h
 create mode 100644 arch/x86/kernel/checkpoint.c
 create mode 100644 fs/checkpoint.c
 create mode 100644 include/linux/checkpoint.h
 create mode 100644 kernel/checkpoint/Kconfig
 create mode 100644 kernel/checkpoint/Makefile
 create mode 100644 kernel/checkpoint/checkpoint.c
 create mode 100644 kernel/checkpoint/objhash.c
 create mode 100644 kernel/checkpoint/restart.c
 create mode 100644 kernel/checkpoint/sys.c
 create mode 100644 mm/checkpoint.c

-- 
1.7.4

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2011-04-05 19:19 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-28 23:40 [RFC 00/10] container-based checkpoint/restart prototype ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-02-28 23:40 ` ntl
2011-02-28 23:40 ` [PATCH 03/10] Introduce has_locks_with_owner() helper ntl
     [not found]   ` <1298936432-29607-4-git-send-email-ntl-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2011-04-03 18:55     ` Serge E. Hallyn
2011-04-03 18:55   ` Serge E. Hallyn
2011-02-28 23:40 ` [PATCH 04/10] Introduce vfs_fcntl() helper ntl
2011-04-03 18:57   ` Serge E. Hallyn
     [not found]   ` <1298936432-29607-5-git-send-email-ntl-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2011-04-03 18:57     ` Serge E. Hallyn
2011-02-28 23:40 ` [PATCH 05/10] Core checkpoint/restart support code ntl
     [not found]   ` <1298936432-29607-6-git-send-email-ntl-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2011-04-03 19:03     ` Serge E. Hallyn
2011-04-03 19:03   ` Serge E. Hallyn
2011-04-04 15:00     ` Nathan Lynch
2011-04-04 15:10       ` Serge E. Hallyn
     [not found]         ` <20110404151017.GA4857-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-04-04 15:40           ` Nathan Lynch
2011-04-04 15:40         ` Nathan Lynch
2011-04-04 16:27           ` Serge E. Hallyn
2011-04-04 16:27           ` Serge E. Hallyn
2011-04-04 17:41             ` Andrew Morton
     [not found]               ` <20110404104119.78189678.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2011-04-04 18:51                 ` Serge E. Hallyn
2011-04-04 18:51               ` Serge E. Hallyn
2011-04-04 19:42                 ` Andrew Morton
     [not found]                   ` <20110404124222.fd5eb85b.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2011-04-04 20:29                     ` Serge E. Hallyn
2011-04-04 21:55                     ` Matt Helsley
2011-04-04 22:11                     ` Serge E. Hallyn
2011-04-04 22:53                     ` Serge E. Hallyn
2011-04-04 20:29                   ` Serge E. Hallyn
2011-04-04 21:55                   ` Matt Helsley
2011-04-04 23:15                     ` Andrew Morton
     [not found]                     ` <20110404215511.GA27628-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2011-04-04 23:15                       ` Andrew Morton
2011-04-04 23:16                       ` Valdis.Kletnieks-PjAqaU27lzQ
2011-04-04 23:16                     ` Valdis.Kletnieks
2011-04-04 23:43                       ` Matt Helsley
2011-04-04 23:43                       ` Matt Helsley
2011-04-04 22:11                   ` Serge E. Hallyn
2011-04-04 22:53                   ` Serge E. Hallyn
     [not found]                 ` <20110404185119.GB4782-BtbdaCaBcfOTUehee3IRJA@public.gmane.org>
2011-04-04 19:42                   ` Andrew Morton
     [not found]             ` <20110404162753.GA3456-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-04-04 17:32               ` Oren Laadan
2011-04-04 17:32                 ` Oren Laadan
2011-04-04 21:43                 ` Nathan Lynch
2011-04-04 22:03                   ` Serge E. Hallyn
2011-04-04 22:03                   ` Serge E. Hallyn
2011-04-04 23:42                     ` Dan Smith
     [not found]                       ` <87vcyt4m93.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2011-04-05  2:17                         ` Serge E. Hallyn
2011-04-05  2:17                       ` Serge E. Hallyn
     [not found]                         ` <20110405021716.GA16672-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-04-05 19:18                           ` Nathan Lynch
2011-04-05 19:18                         ` Nathan Lynch
     [not found]                     ` <20110404220309.GA10229-BtbdaCaBcfOTUehee3IRJA@public.gmane.org>
2011-04-04 23:42                       ` Dan Smith
2011-04-04 22:29                   ` Matt Helsley
2011-04-04 22:29                   ` Matt Helsley
     [not found]                 ` <4D9A00B1.2080002-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2011-04-04 21:43                   ` Nathan Lynch
2011-04-04 17:41               ` Andrew Morton
2011-04-04 21:20               ` Nathan Lynch
2011-04-04 21:20             ` Nathan Lynch
2011-04-04 21:53               ` Serge E. Hallyn
2011-04-04 21:53               ` Serge E. Hallyn
2011-04-04 15:10       ` Serge E. Hallyn
     [not found]     ` <20110403190324.GD15044-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-04-04 15:00       ` Nathan Lynch
2011-02-28 23:40 ` [PATCH 06/10] Checkpoint/restart mm support ntl
     [not found] ` <1298936432-29607-1-git-send-email-ntl-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2011-02-28 23:40   ` [PATCH 01/10] Make exec_mmap extern ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-02-28 23:40     ` ntl
2011-04-03 16:56     ` Serge E. Hallyn
     [not found]     ` <1298936432-29607-2-git-send-email-ntl-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2011-04-03 16:56       ` Serge E. Hallyn
2011-02-28 23:40   ` [PATCH 02/10] Introduce mm_has_pending_aio() helper ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-02-28 23:40     ` ntl
     [not found]     ` <1298936432-29607-3-git-send-email-ntl-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2011-03-01 15:40       ` Jeff Moyer
2011-03-01 15:40     ` Jeff Moyer
2011-03-01 16:04       ` Nathan Lynch
     [not found]       ` <x494o7mrgrh.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2011-03-01 16:04         ` Nathan Lynch
2011-02-28 23:40   ` [PATCH 03/10] Introduce has_locks_with_owner() helper ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-02-28 23:40   ` [PATCH 04/10] Introduce vfs_fcntl() helper ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-02-28 23:40   ` [PATCH 05/10] Core checkpoint/restart support code ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-02-28 23:40   ` [PATCH 06/10] Checkpoint/restart mm support ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-02-28 23:40   ` [PATCH 07/10] Checkpoint/restart vfs support ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-02-28 23:40   ` [PATCH 08/10] Add generic '->checkpoint' f_op to ext filesystems ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-02-28 23:40   ` [PATCH 09/10] Add generic '->checkpoint()' f_op to simple char devices ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-02-28 23:40   ` [PATCH 10/10] x86_32 support for checkpoint/restart ntl-e+AXbWqSrlAAvxtiuMwx3w
2011-03-01  1:08   ` [RFC 00/10] container-based checkpoint/restart prototype Nathan Lynch
2011-02-28 23:40 ` [PATCH 07/10] Checkpoint/restart vfs support ntl
2011-02-28 23:40 ` [PATCH 08/10] Add generic '->checkpoint' f_op to ext filesystems ntl
2011-02-28 23:40 ` [PATCH 09/10] Add generic '->checkpoint()' f_op to simple char devices ntl
2011-02-28 23:40 ` [PATCH 10/10] x86_32 support for checkpoint/restart ntl
2011-03-01  1:08 ` [RFC 00/10] container-based checkpoint/restart prototype Nathan Lynch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.