All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/2] CR: save/restore a single, simple task
@ 2008-07-30  3:24 Oren Laadan
       [not found] ` <Pine.LNX.4.64.0807292306570.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Oren Laadan @ 2008-07-30  3:24 UTC (permalink / raw)
  To: Linux Containers


In the recent mini-summit at OLS 2008 and the following days it was
agreed to tackle the checkpoint/restart (CR) by beginning with a very
simple case: save and restore a single task, with simple memory
layout, disregarding other task state such as files, signals etc.

Following these discussions I coded a prototype that can do exactly
that, as a starter. This code adds two system calls - sys_checkpoint
and sys_restart - that a task can call to save and restore its state
respectively. It also demonstrates how the checkpoint image file can
be formatted, as well as show its nested nature (e.g. cr_write_mm()
-> cr_write_vma() nesting).

The state that is saved/restored is the following:
* some of the task_struct
* some of the thread_struct and thread_info
* the cpu state (including FPU)
* the memory address space

[The patch is against commit fb2e405fc1fc8b20d9c78eaa1c7fd5a297efde43
of Linus's tree (uhhh.. don't ask why), but against tonight's head too].

In the current code, sys_checkpoint will checkpoint the current task,
although the logic exists to checkpoint other tasks (not in the
checkpointee's execution context). A simple loop will extend this to
handle multiple processes. sys_restart restarts the current tasks, and
with multiple tasks each task will call the syscall independently.
(Actually, to checkpoint outside the context of a task, it is also
necessary to also handle restart-block logic when saving/restoring the
thread data).

It takes longer to describe what isn't implemented or supported by
this prototype ... basically everything that isn't as simple as the
above.

As for containers - since we still don't have a representation for a
container, this patch has no notion of a container. The tests for
consistent namespaces (and isolation) are also omitted.

Below are two example programs: one uses checkpoint (called ckpt) and
one uses restart (called rstr). Execute like this (as a superuser):

orenl:~/test$ ./ckpt > out.1
hello, world!  (ret=1)		<-- sys_checkpoint returns positive id
 				<-- ctrl-c
orenl:~/test$ ./ckpt > out.2
hello, world!  (ret=2)
 				<-- ctrl-c
orenl:~/test$ ./rstr < out.1
hello, world!  (ret=0)		<-- sys_restart return 0

(if you check the output of ps, you'll see that "rstr" changed its
name to "ckpt", as expected).

Hoping this will accelerate the discussion. Comments are welcome.
Let the fun begin :)

Oren.


============================== ckpt.c ================================

#define _GNU_SOURCE        /* or _BSD_SOURCE or _SVID_SOURCE */

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <asm/unistd_32.h>
#include <sys/syscall.h>

int main(int argc, char *argv[])
{
 	pid_t pid = getpid();
 	int ret;

 	ret = syscall(__NR_checkpoint, pid, STDOUT_FILENO, 0);
 	if (ret < 0)
 		perror("checkpoint");

 	fprintf(stderr, "hello, world!  (ret=%d)\n", ret);

 	while (1)
 		;

 	return 0;
}

============================== rstr.c ================================

#define _GNU_SOURCE        /* or _BSD_SOURCE or _SVID_SOURCE */

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <asm/unistd_32.h>
#include <sys/syscall.h>

int main(int argc, char *argv[])
{
 	pid_t pid = getpid();
 	int ret;

 	ret = syscall(__NR_restart, pid, STDIN_FILENO, 0);
 	if (ret < 0)
 		perror("restart");

 	printf("should not reach here !\n");

 	return 0;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-09-03 11:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-30  3:24 [RFC][PATCH 0/2] CR: save/restore a single, simple task Oren Laadan
     [not found] ` <Pine.LNX.4.64.0807292306570.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-07-30 21:35   ` Serge E. Hallyn
     [not found]     ` <20080730213541.GA24192-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-30 21:40       ` Dave Hansen
2008-07-31  0:37         ` Oren Laadan
2008-07-30 23:46       ` Oren Laadan
     [not found]         ` <4890FD57.7050601-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 11:23           ` Daniel Lezcano
     [not found]             ` <4891A0C4.5080906-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-07-31 15:25               ` Oren Laadan
     [not found]                 ` <4891D962.3020407-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 17:15                   ` Daniel Lezcano
2008-07-30 22:16   ` Serge E. Hallyn
2008-07-31  1:11   ` [Devel] " Andrey Mirkin
     [not found]     ` <200807310511.11648.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-07-31 21:28       ` Serge E. Hallyn
     [not found]         ` <20080731212810.GB7858-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-01  5:28           ` Andrey Mirkin
     [not found]             ` <200808010928.21220.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-08-21 21:37               ` Serge E. Hallyn
     [not found]                 ` <20080821213724.GA17862-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-03 11:10                   ` Andrey Mirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.