* [RFC][PATCH 0/2] CR: save/restore a single, simple task
@ 2008-07-30 3:24 Oren Laadan
[not found] ` <Pine.LNX.4.64.0807292306570.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Oren Laadan @ 2008-07-30 3:24 UTC (permalink / raw)
To: Linux Containers
In the recent mini-summit at OLS 2008 and the following days it was
agreed to tackle the checkpoint/restart (CR) by beginning with a very
simple case: save and restore a single task, with simple memory
layout, disregarding other task state such as files, signals etc.
Following these discussions I coded a prototype that can do exactly
that, as a starter. This code adds two system calls - sys_checkpoint
and sys_restart - that a task can call to save and restore its state
respectively. It also demonstrates how the checkpoint image file can
be formatted, as well as show its nested nature (e.g. cr_write_mm()
-> cr_write_vma() nesting).
The state that is saved/restored is the following:
* some of the task_struct
* some of the thread_struct and thread_info
* the cpu state (including FPU)
* the memory address space
[The patch is against commit fb2e405fc1fc8b20d9c78eaa1c7fd5a297efde43
of Linus's tree (uhhh.. don't ask why), but against tonight's head too].
In the current code, sys_checkpoint will checkpoint the current task,
although the logic exists to checkpoint other tasks (not in the
checkpointee's execution context). A simple loop will extend this to
handle multiple processes. sys_restart restarts the current tasks, and
with multiple tasks each task will call the syscall independently.
(Actually, to checkpoint outside the context of a task, it is also
necessary to also handle restart-block logic when saving/restoring the
thread data).
It takes longer to describe what isn't implemented or supported by
this prototype ... basically everything that isn't as simple as the
above.
As for containers - since we still don't have a representation for a
container, this patch has no notion of a container. The tests for
consistent namespaces (and isolation) are also omitted.
Below are two example programs: one uses checkpoint (called ckpt) and
one uses restart (called rstr). Execute like this (as a superuser):
orenl:~/test$ ./ckpt > out.1
hello, world! (ret=1) <-- sys_checkpoint returns positive id
<-- ctrl-c
orenl:~/test$ ./ckpt > out.2
hello, world! (ret=2)
<-- ctrl-c
orenl:~/test$ ./rstr < out.1
hello, world! (ret=0) <-- sys_restart return 0
(if you check the output of ps, you'll see that "rstr" changed its
name to "ckpt", as expected).
Hoping this will accelerate the discussion. Comments are welcome.
Let the fun begin :)
Oren.
============================== ckpt.c ================================
#define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <asm/unistd_32.h>
#include <sys/syscall.h>
int main(int argc, char *argv[])
{
pid_t pid = getpid();
int ret;
ret = syscall(__NR_checkpoint, pid, STDOUT_FILENO, 0);
if (ret < 0)
perror("checkpoint");
fprintf(stderr, "hello, world! (ret=%d)\n", ret);
while (1)
;
return 0;
}
============================== rstr.c ================================
#define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <asm/unistd_32.h>
#include <sys/syscall.h>
int main(int argc, char *argv[])
{
pid_t pid = getpid();
int ret;
ret = syscall(__NR_restart, pid, STDIN_FILENO, 0);
if (ret < 0)
perror("restart");
printf("should not reach here !\n");
return 0;
}
^ permalink raw reply [flat|nested] 14+ messages in thread[parent not found: <Pine.LNX.4.64.0807292306570.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>]
* Re: [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <Pine.LNX.4.64.0807292306570.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> @ 2008-07-30 21:35 ` Serge E. Hallyn [not found] ` <20080730213541.GA24192-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-07-30 22:16 ` Serge E. Hallyn 2008-07-31 1:11 ` [Devel] " Andrey Mirkin 2 siblings, 1 reply; 14+ messages in thread From: Serge E. Hallyn @ 2008-07-30 21:35 UTC (permalink / raw) To: Oren Laadan; +Cc: Linux Containers Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org): > > In the recent mini-summit at OLS 2008 and the following days it was > agreed to tackle the checkpoint/restart (CR) by beginning with a very > simple case: save and restore a single task, with simple memory > layout, disregarding other task state such as files, signals etc. > > Following these discussions I coded a prototype that can do exactly > that, as a starter. This code adds two system calls - sys_checkpoint > and sys_restart - that a task can call to save and restore its state > respectively. It also demonstrates how the checkpoint image file can > be formatted, as well as show its nested nature (e.g. cr_write_mm() > -> cr_write_vma() nesting). > > The state that is saved/restored is the following: > * some of the task_struct > * some of the thread_struct and thread_info > * the cpu state (including FPU) > * the memory address space > > [The patch is against commit fb2e405fc1fc8b20d9c78eaa1c7fd5a297efde43 > of Linus's tree (uhhh.. don't ask why), but against tonight's head too]. > > In the current code, sys_checkpoint will checkpoint the current task, > although the logic exists to checkpoint other tasks (not in the > checkpointee's execution context). A simple loop will extend this to > handle multiple processes. sys_restart restarts the current tasks, and > with multiple tasks each task will call the syscall independently. I assume that approach worked in Zap, so there must be a simple solution to this, but I don't see how having each process in a container independently call sys_restart works for sharing. Oh, or is that where a 'container restart context' comes in? An nsproxy has a pointer to a checkpoint/restart context which the first task creates and all tasks reference and update? So task 5 created its mm_struct, task 6 is supposed to use the same mm_struct, so it finds that out from the context? I wonder whether that would start to become complicated when checkpointing nested containers. So I still prefer the idea that the init process calls restart, and that creates all the tasks in the container and rebuilds them. But you have code, so you win :) Anyway I'm still reading through patch 2. It looks great to me - the only comments I have written so far are: 1. why not just store LINUX_VERSION_CODE in the header instead of breaking it up 2. the x86-specific code should of course go into arch-specific directories, but neither of which really is worth the bother right now imo :) > (Actually, to checkpoint outside the context of a task, it is also > necessary to also handle restart-block logic when saving/restoring the > thread data). > > It takes longer to describe what isn't implemented or supported by > this prototype ... basically everything that isn't as simple as the > above. > > As for containers - since we still don't have a representation for a > container, this patch has no notion of a container. The tests for > consistent namespaces (and isolation) are also omitted. > > Below are two example programs: one uses checkpoint (called ckpt) and > one uses restart (called rstr). Execute like this (as a superuser): > > orenl:~/test$ ./ckpt > out.1 > hello, world! (ret=1) <-- sys_checkpoint returns positive id > <-- ctrl-c > orenl:~/test$ ./ckpt > out.2 > hello, world! (ret=2) > <-- ctrl-c > orenl:~/test$ ./rstr < out.1 > hello, world! (ret=0) <-- sys_restart return 0 > > (if you check the output of ps, you'll see that "rstr" changed its > name to "ckpt", as expected). > > Hoping this will accelerate the discussion. Comments are welcome. > Let the fun begin :) > > Oren. > > > ============================== ckpt.c ================================ > > #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ > > #include <stdio.h> > #include <stdlib.h> > #include <errno.h> > #include <fcntl.h> > #include <unistd.h> > #include <asm/unistd_32.h> > #include <sys/syscall.h> > > int main(int argc, char *argv[]) > { > pid_t pid = getpid(); > int ret; > > ret = syscall(__NR_checkpoint, pid, STDOUT_FILENO, 0); > if (ret < 0) > perror("checkpoint"); > > fprintf(stderr, "hello, world! (ret=%d)\n", ret); > > while (1) > ; > > return 0; > } > > ============================== rstr.c ================================ > > #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ > > #include <stdio.h> > #include <stdlib.h> > #include <errno.h> > #include <fcntl.h> > #include <unistd.h> > #include <asm/unistd_32.h> > #include <sys/syscall.h> > > int main(int argc, char *argv[]) > { > pid_t pid = getpid(); > int ret; > > ret = syscall(__NR_restart, pid, STDIN_FILENO, 0); > if (ret < 0) > perror("restart"); > > printf("should not reach here !\n"); > > return 0; > } > _______________________________________________ > Containers mailing list > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org > https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20080730213541.GA24192-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <20080730213541.GA24192-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-07-30 21:40 ` Dave Hansen 2008-07-31 0:37 ` Oren Laadan 2008-07-30 23:46 ` Oren Laadan 1 sibling, 1 reply; 14+ messages in thread From: Dave Hansen @ 2008-07-30 21:40 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: Linux Containers On Wed, 2008-07-30 at 16:35 -0500, Serge E. Hallyn wrote: > So task 5 created its mm_struct, task 6 is > supposed to use the same mm_struct, so it finds that out from the > context? I wonder whether that would start to become complicated > when checkpointing nested containers. It also doesn't fit well with the nsproxy idea. It would be very hard to tell which nsproxies should be shared until the entire restart has been completed and we've been able to figure out which ones are the same. This is just a coding/implementation issue, but I think it does reveal a difference in ideology between these patches and the way that the kernel works up until now. :) -- Dave ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH 0/2] CR: save/restore a single, simple task 2008-07-30 21:40 ` Dave Hansen @ 2008-07-31 0:37 ` Oren Laadan 0 siblings, 0 replies; 14+ messages in thread From: Oren Laadan @ 2008-07-31 0:37 UTC (permalink / raw) To: Dave Hansen; +Cc: Linux Containers Dave Hansen wrote: > On Wed, 2008-07-30 at 16:35 -0500, Serge E. Hallyn wrote: >> So task 5 created its mm_struct, task 6 is >> supposed to use the same mm_struct, so it finds that out from the >> context? I wonder whether that would start to become complicated >> when checkpointing nested containers. > > It also doesn't fit well with the nsproxy idea. It would be very hard > to tell which nsproxies should be shared until the entire restart has > been completed and we've been able to figure out which ones are the > same. I'm not sure I fully understand the problem that you describe, however, I reckon that nsproxies are, basically, yet-another shared object between tasks in the kernel. As such, the CR logic will treat them like other shared objects: the first time one is found, its state will be saved; the next time the same one is found, only an identifier will be saved. That definitely means that, yes, the state within nsproxies as kernel resources will have to be saved as part of the checkpoint. Oren. > > This is just a coding/implementation issue, but I think it does reveal a > difference in ideology between these patches and the way that the kernel > works up until now. :) > > -- Dave > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <20080730213541.GA24192-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2008-07-30 21:40 ` Dave Hansen @ 2008-07-30 23:46 ` Oren Laadan [not found] ` <4890FD57.7050601-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 1 sibling, 1 reply; 14+ messages in thread From: Oren Laadan @ 2008-07-30 23:46 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: Linux Containers Disclaimer: long reply :) Serge E. Hallyn wrote: > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org): >> In the recent mini-summit at OLS 2008 and the following days it was >> agreed to tackle the checkpoint/restart (CR) by beginning with a very >> simple case: save and restore a single task, with simple memory >> layout, disregarding other task state such as files, signals etc. >> >> Following these discussions I coded a prototype that can do exactly >> that, as a starter. This code adds two system calls - sys_checkpoint >> and sys_restart - that a task can call to save and restore its state >> respectively. It also demonstrates how the checkpoint image file can >> be formatted, as well as show its nested nature (e.g. cr_write_mm() >> -> cr_write_vma() nesting). >> >> The state that is saved/restored is the following: >> * some of the task_struct >> * some of the thread_struct and thread_info >> * the cpu state (including FPU) >> * the memory address space >> >> [The patch is against commit fb2e405fc1fc8b20d9c78eaa1c7fd5a297efde43 >> of Linus's tree (uhhh.. don't ask why), but against tonight's head too]. >> >> In the current code, sys_checkpoint will checkpoint the current task, >> although the logic exists to checkpoint other tasks (not in the >> checkpointee's execution context). A simple loop will extend this to >> handle multiple processes. sys_restart restarts the current tasks, and >> with multiple tasks each task will call the syscall independently. > > I assume that approach worked in Zap, so there must be a simple solution > to this, but I don't see how having each process in a container > independently call sys_restart works for sharing. Oh, or is that where The main reason to do that (and I thought openvz works similarly ?) is that I want to re-use as much as possible the existing kernel functionality. Restart differs from checkpoint in that you have to construct new resources as opposed to only inspect existing resources. To inspect - you only need a reference to the object and then to obtain its state by accessing it. In contrast, to construct, you need to create a new resource. In almost all cases, creating a resource for a process is easiest if done by the process itself. For instance - to restore the memory map, you want the process that owns the target mm to call mmap() (in particular, the lower level and more convenient for us do_mmap_pgoff() function). If the process that restores a given vma didn't own that mm, it would take much more pain to build the vma into a "foreign" mm. Thus, there is a huge advantage of doing everything in-context of the target process, that is - we can re-use the existing kernel code (and spirit) to create the resources, instead of having to hand-craft them carefully with specialized code. > a 'container restart context' comes in? An nsproxy has a pointer to a More or less. At a first approximation, this is how I envision it: 0) in user space, a new (empty) container will be created with all the needed settings for the file system etc (mounts .. and the like) 1) the first task (container init) will call sys_restart with the checkpoint image file. 2) the code will verify the header, then read in the global section; it will create a restart-context which will be referenced from the container-object (one option we considered is to have the freezer-cgroup be that object). 3) using the info from that section, it will create the task tree (forest) to be restored. In particular, new tasks will be created and each will end up in do_restart_task() inside the kernel. [note that in Zap, step 3 is still done in user space...] Since all tasks live in the container, they will all have access to the restart-context, through which all coordination is done. At first, the restart will be performed _one task at a time_, at the order they were dumped. So while the init task restores itself, the remaining tasks sleep. When the init task finishes - it will wake the next in line and so on. The last one will wake the init task to finalize the work. So: 4) each task waits (sleeps) until it is prompted to restore its own state. When it completes, it wakes up the next task in line and goes to a freeze state. 5) the init task finalized the restart, and either completes the freeze or unfreezes the container, depending on what the user requested. This scheme makes sense because we assume that the data is streamed. So it does not make much sense to try to restart the 5th job before the 2nd job because the data isn't there yet. Moreover, if they refer to the same shared object, job#5 will have to wait to job#2 to create the object, since its state was saved with that job. In the future, to speed the process by concurrent restarting multiple tasks, we'll have to read in data from the stream into a buffer (read-ahead) and then restarting tasks could skip data that doesn't belongs to them; while they may still need to wait for shared resources to be created, other work can be done in parallel in the meanwhile. > checkpoint/restart context which the first task creates and all tasks > reference and update? So task 5 created its mm_struct, task 6 is > supposed to use the same mm_struct, so it finds that out from the > context? I wonder whether that would start to become complicated > when checkpointing nested containers. Yes, that's what I had in mind - the restart context holds a hash table that references all the shared objects that are created during the restart. (Like the checkpoint context that will hold references to objects that have been inspected). Checkpointing nested containers ??? Why ? I'm not sure why would that be a problem; but sure, we need to discuss that using a concrete use-case and identify the needs and difficulties. > So I still prefer the idea that the init process calls restart, and that > creates all the tasks in the container and rebuilds them. But you have > code, so you win :) I agree: the init task calls restart, and that creates all the tasks in the container. And then, make each of them call do_restart_task() in some way :) > > Anyway I'm still reading through patch 2. It looks great to me - the > only comments I have written so far are: > 1. why not just store LINUX_VERSION_CODE in the header instead > of breaking it up hmph ... good question. Avoid 32/64 bit conversion complications ? > 2. the x86-specific code should of course go into arch-specific > directories, but of course. I left it there for simplicity right now. > neither of which really is worth the bother right now imo :) > >> (Actually, to checkpoint outside the context of a task, it is also >> necessary to also handle restart-block logic when saving/restoring the >> thread data). >> >> It takes longer to describe what isn't implemented or supported by >> this prototype ... basically everything that isn't as simple as the >> above. >> >> As for containers - since we still don't have a representation for a >> container, this patch has no notion of a container. The tests for >> consistent namespaces (and isolation) are also omitted. >> >> Below are two example programs: one uses checkpoint (called ckpt) and >> one uses restart (called rstr). Execute like this (as a superuser): >> >> orenl:~/test$ ./ckpt > out.1 >> hello, world! (ret=1) <-- sys_checkpoint returns positive id >> <-- ctrl-c >> orenl:~/test$ ./ckpt > out.2 >> hello, world! (ret=2) >> <-- ctrl-c >> orenl:~/test$ ./rstr < out.1 >> hello, world! (ret=0) <-- sys_restart return 0 >> >> (if you check the output of ps, you'll see that "rstr" changed its >> name to "ckpt", as expected). >> >> Hoping this will accelerate the discussion. Comments are welcome. >> Let the fun begin :) >> >> Oren. >> >> >> ============================== ckpt.c ================================ >> >> #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ >> >> #include <stdio.h> >> #include <stdlib.h> >> #include <errno.h> >> #include <fcntl.h> >> #include <unistd.h> >> #include <asm/unistd_32.h> >> #include <sys/syscall.h> >> >> int main(int argc, char *argv[]) >> { >> pid_t pid = getpid(); >> int ret; >> >> ret = syscall(__NR_checkpoint, pid, STDOUT_FILENO, 0); >> if (ret < 0) >> perror("checkpoint"); >> >> fprintf(stderr, "hello, world! (ret=%d)\n", ret); >> >> while (1) >> ; >> >> return 0; >> } >> >> ============================== rstr.c ================================ >> >> #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ >> >> #include <stdio.h> >> #include <stdlib.h> >> #include <errno.h> >> #include <fcntl.h> >> #include <unistd.h> >> #include <asm/unistd_32.h> >> #include <sys/syscall.h> >> >> int main(int argc, char *argv[]) >> { >> pid_t pid = getpid(); >> int ret; >> >> ret = syscall(__NR_restart, pid, STDIN_FILENO, 0); >> if (ret < 0) >> perror("restart"); >> >> printf("should not reach here !\n"); >> >> return 0; >> } >> _______________________________________________ >> Containers mailing list >> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org >> https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <4890FD57.7050601-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>]
* Re: [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <4890FD57.7050601-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2008-07-31 11:23 ` Daniel Lezcano [not found] ` <4891A0C4.5080906-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Daniel Lezcano @ 2008-07-31 11:23 UTC (permalink / raw) To: Oren Laadan; +Cc: Linux Containers Oren Laadan wrote: > Disclaimer: long reply :) > > Serge E. Hallyn wrote: >> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org): >>> In the recent mini-summit at OLS 2008 and the following days it was >>> agreed to tackle the checkpoint/restart (CR) by beginning with a very >>> simple case: save and restore a single task, with simple memory >>> layout, disregarding other task state such as files, signals etc. >>> >>> Following these discussions I coded a prototype that can do exactly >>> that, as a starter. This code adds two system calls - sys_checkpoint >>> and sys_restart - that a task can call to save and restore its state >>> respectively. It also demonstrates how the checkpoint image file can >>> be formatted, as well as show its nested nature (e.g. cr_write_mm() >>> -> cr_write_vma() nesting). >>> >>> The state that is saved/restored is the following: >>> * some of the task_struct >>> * some of the thread_struct and thread_info >>> * the cpu state (including FPU) >>> * the memory address space >>> >>> [The patch is against commit fb2e405fc1fc8b20d9c78eaa1c7fd5a297efde43 >>> of Linus's tree (uhhh.. don't ask why), but against tonight's head too]. >>> >>> In the current code, sys_checkpoint will checkpoint the current task, >>> although the logic exists to checkpoint other tasks (not in the >>> checkpointee's execution context). A simple loop will extend this to >>> handle multiple processes. sys_restart restarts the current tasks, and >>> with multiple tasks each task will call the syscall independently. >> I assume that approach worked in Zap, so there must be a simple solution >> to this, but I don't see how having each process in a container >> independently call sys_restart works for sharing. Oh, or is that where > > The main reason to do that (and I thought openvz works similarly ?) is > that I want to re-use as much as possible the existing kernel functionality. > Restart differs from checkpoint in that you have to construct new resources > as opposed to only inspect existing resources. To inspect - you only need > a reference to the object and then to obtain its state by accessing it. In > contrast, to construct, you need to create a new resource. > > In almost all cases, creating a resource for a process is easiest if done by > the process itself. For instance - to restore the memory map, you want the > process that owns the target mm to call mmap() (in particular, the lower > level and more convenient for us do_mmap_pgoff() function). If the process > that restores a given vma didn't own that mm, it would take much more pain > to build the vma into a "foreign" mm. > > Thus, there is a huge advantage of doing everything in-context of the target > process, that is - we can re-use the existing kernel code (and spirit) to > create the resources, instead of having to hand-craft them carefully with > specialized code. > >> a 'container restart context' comes in? An nsproxy has a pointer to a > > More or less. At a first approximation, this is how I envision it: > > 0) in user space, a new (empty) container will be created with all the > needed settings for the file system etc (mounts .. and the like) > > 1) the first task (container init) will call sys_restart with the checkpoint > image file. > > 2) the code will verify the header, then read in the global section; it will > create a restart-context which will be referenced from the container-object > (one option we considered is to have the freezer-cgroup be that object). > > 3) using the info from that section, it will create the task tree (forest) > to be restored. In particular, new tasks will be created and each will end > up in do_restart_task() inside the kernel. > > [note that in Zap, step 3 is still done in user space...] > > Since all tasks live in the container, they will all have access to the > restart-context, through which all coordination is done. > > At first, the restart will be performed _one task at a time_, at the order > they were dumped. So while the init task restores itself, the remaining > tasks sleep. When the init task finishes - it will wake the next in line > and so on. The last one will wake the init task to finalize the work. So: > > 4) each task waits (sleeps) until it is prompted to restore its own state. > When it completes, it wakes up the next task in line and goes to a freeze > state. > > 5) the init task finalized the restart, and either completes the freeze or > unfreezes the container, depending on what the user requested. > > This scheme makes sense because we assume that the data is streamed. So it > does not make much sense to try to restart the 5th job before the 2nd job > because the data isn't there yet. Moreover, if they refer to the same shared > object, job#5 will have to wait to job#2 to create the object, since its > state was saved with that job. > > In the future, to speed the process by concurrent restarting multiple tasks, > we'll have to read in data from the stream into a buffer (read-ahead) and > then restarting tasks could skip data that doesn't belongs to them; while > they may still need to wait for shared resources to be created, other work > can be done in parallel in the meanwhile. > >> checkpoint/restart context which the first task creates and all tasks >> reference and update? So task 5 created its mm_struct, task 6 is >> supposed to use the same mm_struct, so it finds that out from the >> context? I wonder whether that would start to become complicated >> when checkpointing nested containers. > > Yes, that's what I had in mind - the restart context holds a hash table > that references all the shared objects that are created during the restart. > (Like the checkpoint context that will hold references to objects that > have been inspected). > > Checkpointing nested containers ??? Why ? > I'm not sure why would that be a problem; but sure, we need to discuss > that using a concrete use-case and identify the needs and difficulties. In the current proposition, we talked about creating an empty container and the first process calls sys_restart. With nested container, we have to CR the container itself no ? I don't see how we can CR nested container otherwise :/ >> So I still prefer the idea that the init process calls restart, and that >> creates all the tasks in the container and rebuilds them. But you have >> code, so you win :) > > I agree: the init task calls restart, and that creates all the tasks in > the container. And then, make each of them call do_restart_task() in > some way :) > >> Anyway I'm still reading through patch 2. It looks great to me - the >> only comments I have written so far are: >> 1. why not just store LINUX_VERSION_CODE in the header instead >> of breaking it up > > hmph ... good question. Avoid 32/64 bit conversion complications ? > >> 2. the x86-specific code should of course go into arch-specific >> directories, but > > of course. I left it there for simplicity right now. > >> neither of which really is worth the bother right now imo :) >> >>> (Actually, to checkpoint outside the context of a task, it is also >>> necessary to also handle restart-block logic when saving/restoring the >>> thread data). >>> >>> It takes longer to describe what isn't implemented or supported by >>> this prototype ... basically everything that isn't as simple as the >>> above. >>> >>> As for containers - since we still don't have a representation for a >>> container, this patch has no notion of a container. The tests for >>> consistent namespaces (and isolation) are also omitted. >>> >>> Below are two example programs: one uses checkpoint (called ckpt) and >>> one uses restart (called rstr). Execute like this (as a superuser): >>> >>> orenl:~/test$ ./ckpt > out.1 >>> hello, world! (ret=1) <-- sys_checkpoint returns positive id >>> <-- ctrl-c >>> orenl:~/test$ ./ckpt > out.2 >>> hello, world! (ret=2) >>> <-- ctrl-c >>> orenl:~/test$ ./rstr < out.1 >>> hello, world! (ret=0) <-- sys_restart return 0 >>> >>> (if you check the output of ps, you'll see that "rstr" changed its >>> name to "ckpt", as expected). >>> >>> Hoping this will accelerate the discussion. Comments are welcome. >>> Let the fun begin :) >>> >>> Oren. >>> >>> >>> ============================== ckpt.c ================================ >>> >>> #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ >>> >>> #include <stdio.h> >>> #include <stdlib.h> >>> #include <errno.h> >>> #include <fcntl.h> >>> #include <unistd.h> >>> #include <asm/unistd_32.h> >>> #include <sys/syscall.h> >>> >>> int main(int argc, char *argv[]) >>> { >>> pid_t pid = getpid(); >>> int ret; >>> >>> ret = syscall(__NR_checkpoint, pid, STDOUT_FILENO, 0); >>> if (ret < 0) >>> perror("checkpoint"); >>> >>> fprintf(stderr, "hello, world! (ret=%d)\n", ret); >>> >>> while (1) >>> ; >>> >>> return 0; >>> } >>> >>> ============================== rstr.c ================================ >>> >>> #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ >>> >>> #include <stdio.h> >>> #include <stdlib.h> >>> #include <errno.h> >>> #include <fcntl.h> >>> #include <unistd.h> >>> #include <asm/unistd_32.h> >>> #include <sys/syscall.h> >>> >>> int main(int argc, char *argv[]) >>> { >>> pid_t pid = getpid(); >>> int ret; >>> >>> ret = syscall(__NR_restart, pid, STDIN_FILENO, 0); >>> if (ret < 0) >>> perror("restart"); >>> >>> printf("should not reach here !\n"); >>> >>> return 0; >>> } ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <4891A0C4.5080906-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>]
* Re: [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <4891A0C4.5080906-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> @ 2008-07-31 15:25 ` Oren Laadan [not found] ` <4891D962.3020407-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Oren Laadan @ 2008-07-31 15:25 UTC (permalink / raw) To: Daniel Lezcano; +Cc: Linux Containers Daniel Lezcano wrote: > Oren Laadan wrote: >> Disclaimer: long reply :) >> >> Serge E. Hallyn wrote: >>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org): >>>> In the recent mini-summit at OLS 2008 and the following days it was >>>> agreed to tackle the checkpoint/restart (CR) by beginning with a very >>>> simple case: save and restore a single task, with simple memory >>>> layout, disregarding other task state such as files, signals etc. >>>> >>>> Following these discussions I coded a prototype that can do exactly >>>> that, as a starter. This code adds two system calls - sys_checkpoint >>>> and sys_restart - that a task can call to save and restore its state >>>> respectively. It also demonstrates how the checkpoint image file can >>>> be formatted, as well as show its nested nature (e.g. cr_write_mm() >>>> -> cr_write_vma() nesting). >>>> >>>> The state that is saved/restored is the following: >>>> * some of the task_struct >>>> * some of the thread_struct and thread_info >>>> * the cpu state (including FPU) >>>> * the memory address space >>>> >>>> [The patch is against commit fb2e405fc1fc8b20d9c78eaa1c7fd5a297efde43 >>>> of Linus's tree (uhhh.. don't ask why), but against tonight's head >>>> too]. >>>> >>>> In the current code, sys_checkpoint will checkpoint the current task, >>>> although the logic exists to checkpoint other tasks (not in the >>>> checkpointee's execution context). A simple loop will extend this to >>>> handle multiple processes. sys_restart restarts the current tasks, and >>>> with multiple tasks each task will call the syscall independently. >>> I assume that approach worked in Zap, so there must be a simple solution >>> to this, but I don't see how having each process in a container >>> independently call sys_restart works for sharing. Oh, or is that where >> >> The main reason to do that (and I thought openvz works similarly ?) is >> that I want to re-use as much as possible the existing kernel >> functionality. >> Restart differs from checkpoint in that you have to construct new >> resources >> as opposed to only inspect existing resources. To inspect - you only need >> a reference to the object and then to obtain its state by accessing >> it. In >> contrast, to construct, you need to create a new resource. >> >> In almost all cases, creating a resource for a process is easiest if >> done by >> the process itself. For instance - to restore the memory map, you want >> the >> process that owns the target mm to call mmap() (in particular, the lower >> level and more convenient for us do_mmap_pgoff() function). If the >> process >> that restores a given vma didn't own that mm, it would take much more >> pain >> to build the vma into a "foreign" mm. >> >> Thus, there is a huge advantage of doing everything in-context of the >> target >> process, that is - we can re-use the existing kernel code (and spirit) to >> create the resources, instead of having to hand-craft them carefully with >> specialized code. >> >>> a 'container restart context' comes in? An nsproxy has a pointer to a >> >> More or less. At a first approximation, this is how I envision it: >> >> 0) in user space, a new (empty) container will be created with all the >> needed settings for the file system etc (mounts .. and the like) >> >> 1) the first task (container init) will call sys_restart with the >> checkpoint >> image file. >> >> 2) the code will verify the header, then read in the global section; >> it will >> create a restart-context which will be referenced from the >> container-object >> (one option we considered is to have the freezer-cgroup be that object). >> >> 3) using the info from that section, it will create the task tree >> (forest) >> to be restored. In particular, new tasks will be created and each will >> end >> up in do_restart_task() inside the kernel. >> >> [note that in Zap, step 3 is still done in user space...] >> >> Since all tasks live in the container, they will all have access to the >> restart-context, through which all coordination is done. >> >> At first, the restart will be performed _one task at a time_, at the >> order >> they were dumped. So while the init task restores itself, the remaining >> tasks sleep. When the init task finishes - it will wake the next in line >> and so on. The last one will wake the init task to finalize the work. So: >> >> 4) each task waits (sleeps) until it is prompted to restore its own >> state. >> When it completes, it wakes up the next task in line and goes to a freeze >> state. >> >> 5) the init task finalized the restart, and either completes the >> freeze or >> unfreezes the container, depending on what the user requested. >> >> This scheme makes sense because we assume that the data is streamed. >> So it >> does not make much sense to try to restart the 5th job before the 2nd job >> because the data isn't there yet. Moreover, if they refer to the same >> shared >> object, job#5 will have to wait to job#2 to create the object, since its >> state was saved with that job. >> >> In the future, to speed the process by concurrent restarting multiple >> tasks, >> we'll have to read in data from the stream into a buffer (read-ahead) and >> then restarting tasks could skip data that doesn't belongs to them; while >> they may still need to wait for shared resources to be created, other >> work >> can be done in parallel in the meanwhile. >> >>> checkpoint/restart context which the first task creates and all tasks >>> reference and update? So task 5 created its mm_struct, task 6 is >>> supposed to use the same mm_struct, so it finds that out from the >>> context? I wonder whether that would start to become complicated >>> when checkpointing nested containers. >> >> Yes, that's what I had in mind - the restart context holds a hash table >> that references all the shared objects that are created during the >> restart. >> (Like the checkpoint context that will hold references to objects that >> have been inspected). >> >> Checkpointing nested containers ??? Why ? >> I'm not sure why would that be a problem; but sure, we need to discuss >> that using a concrete use-case and identify the needs and difficulties. > > In the current proposition, we talked about creating an empty container > and the first process calls sys_restart. With nested container, we have > to CR the container itself no ? I don't see how we can CR nested > container otherwise :/ Probably so: with nested containers it is necessary to also save the state of the "container-tree" (which is sort of analogous to task-tree). In particular, because tasks in nested containers are essentially part of the outermost container that is being checkpointed. Is this issue specific to the proposed scheme, or a general issue of any scheme ? I think that to tackle this, we need to first agree and implement an object that represents a container (again, the freezer_cgroup ?). Oren. > >>> So I still prefer the idea that the init process calls restart, and that >>> creates all the tasks in the container and rebuilds them. But you have >>> code, so you win :) >> >> I agree: the init task calls restart, and that creates all the tasks in >> the container. And then, make each of them call do_restart_task() in >> some way :) >> >>> Anyway I'm still reading through patch 2. It looks great to me - the >>> only comments I have written so far are: >>> 1. why not just store LINUX_VERSION_CODE in the header instead >>> of breaking it up >> >> hmph ... good question. Avoid 32/64 bit conversion complications ? >> >>> 2. the x86-specific code should of course go into arch-specific >>> directories, but >> >> of course. I left it there for simplicity right now. >> >>> neither of which really is worth the bother right now imo :) >>> >>>> (Actually, to checkpoint outside the context of a task, it is also >>>> necessary to also handle restart-block logic when saving/restoring the >>>> thread data). >>>> >>>> It takes longer to describe what isn't implemented or supported by >>>> this prototype ... basically everything that isn't as simple as the >>>> above. >>>> >>>> As for containers - since we still don't have a representation for a >>>> container, this patch has no notion of a container. The tests for >>>> consistent namespaces (and isolation) are also omitted. >>>> >>>> Below are two example programs: one uses checkpoint (called ckpt) and >>>> one uses restart (called rstr). Execute like this (as a superuser): >>>> >>>> orenl:~/test$ ./ckpt > out.1 >>>> hello, world! (ret=1) <-- sys_checkpoint returns positive id >>>> <-- ctrl-c >>>> orenl:~/test$ ./ckpt > out.2 >>>> hello, world! (ret=2) >>>> <-- ctrl-c >>>> orenl:~/test$ ./rstr < out.1 >>>> hello, world! (ret=0) <-- sys_restart return 0 >>>> >>>> (if you check the output of ps, you'll see that "rstr" changed its >>>> name to "ckpt", as expected). >>>> >>>> Hoping this will accelerate the discussion. Comments are welcome. >>>> Let the fun begin :) >>>> >>>> Oren. >>>> >>>> >>>> ============================== ckpt.c ================================ >>>> >>>> #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ >>>> >>>> #include <stdio.h> >>>> #include <stdlib.h> >>>> #include <errno.h> >>>> #include <fcntl.h> >>>> #include <unistd.h> >>>> #include <asm/unistd_32.h> >>>> #include <sys/syscall.h> >>>> >>>> int main(int argc, char *argv[]) >>>> { >>>> pid_t pid = getpid(); >>>> int ret; >>>> >>>> ret = syscall(__NR_checkpoint, pid, STDOUT_FILENO, 0); >>>> if (ret < 0) >>>> perror("checkpoint"); >>>> >>>> fprintf(stderr, "hello, world! (ret=%d)\n", ret); >>>> >>>> while (1) >>>> ; >>>> >>>> return 0; >>>> } >>>> >>>> ============================== rstr.c ================================ >>>> >>>> #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ >>>> >>>> #include <stdio.h> >>>> #include <stdlib.h> >>>> #include <errno.h> >>>> #include <fcntl.h> >>>> #include <unistd.h> >>>> #include <asm/unistd_32.h> >>>> #include <sys/syscall.h> >>>> >>>> int main(int argc, char *argv[]) >>>> { >>>> pid_t pid = getpid(); >>>> int ret; >>>> >>>> ret = syscall(__NR_restart, pid, STDIN_FILENO, 0); >>>> if (ret < 0) >>>> perror("restart"); >>>> >>>> printf("should not reach here !\n"); >>>> >>>> return 0; >>>> } ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <4891D962.3020407-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>]
* Re: [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <4891D962.3020407-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2008-07-31 17:15 ` Daniel Lezcano 0 siblings, 0 replies; 14+ messages in thread From: Daniel Lezcano @ 2008-07-31 17:15 UTC (permalink / raw) To: Oren Laadan; +Cc: Linux Containers Oren Laadan wrote: > > Daniel Lezcano wrote: >> Oren Laadan wrote: >>> Disclaimer: long reply :) >>> >>> Serge E. Hallyn wrote: >>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org): >>>>> In the recent mini-summit at OLS 2008 and the following days it was >>>>> agreed to tackle the checkpoint/restart (CR) by beginning with a very >>>>> simple case: save and restore a single task, with simple memory >>>>> layout, disregarding other task state such as files, signals etc. >>>>> >>>>> Following these discussions I coded a prototype that can do exactly >>>>> that, as a starter. This code adds two system calls - sys_checkpoint >>>>> and sys_restart - that a task can call to save and restore its state >>>>> respectively. It also demonstrates how the checkpoint image file can >>>>> be formatted, as well as show its nested nature (e.g. cr_write_mm() >>>>> -> cr_write_vma() nesting). >>>>> >>>>> The state that is saved/restored is the following: >>>>> * some of the task_struct >>>>> * some of the thread_struct and thread_info >>>>> * the cpu state (including FPU) >>>>> * the memory address space >>>>> >>>>> [The patch is against commit fb2e405fc1fc8b20d9c78eaa1c7fd5a297efde43 >>>>> of Linus's tree (uhhh.. don't ask why), but against tonight's head >>>>> too]. >>>>> >>>>> In the current code, sys_checkpoint will checkpoint the current task, >>>>> although the logic exists to checkpoint other tasks (not in the >>>>> checkpointee's execution context). A simple loop will extend this to >>>>> handle multiple processes. sys_restart restarts the current tasks, and >>>>> with multiple tasks each task will call the syscall independently. >>>> I assume that approach worked in Zap, so there must be a simple solution >>>> to this, but I don't see how having each process in a container >>>> independently call sys_restart works for sharing. Oh, or is that where >>> The main reason to do that (and I thought openvz works similarly ?) is >>> that I want to re-use as much as possible the existing kernel >>> functionality. >>> Restart differs from checkpoint in that you have to construct new >>> resources >>> as opposed to only inspect existing resources. To inspect - you only need >>> a reference to the object and then to obtain its state by accessing >>> it. In >>> contrast, to construct, you need to create a new resource. >>> >>> In almost all cases, creating a resource for a process is easiest if >>> done by >>> the process itself. For instance - to restore the memory map, you want >>> the >>> process that owns the target mm to call mmap() (in particular, the lower >>> level and more convenient for us do_mmap_pgoff() function). If the >>> process >>> that restores a given vma didn't own that mm, it would take much more >>> pain >>> to build the vma into a "foreign" mm. >>> >>> Thus, there is a huge advantage of doing everything in-context of the >>> target >>> process, that is - we can re-use the existing kernel code (and spirit) to >>> create the resources, instead of having to hand-craft them carefully with >>> specialized code. >>> >>>> a 'container restart context' comes in? An nsproxy has a pointer to a >>> More or less. At a first approximation, this is how I envision it: >>> >>> 0) in user space, a new (empty) container will be created with all the >>> needed settings for the file system etc (mounts .. and the like) >>> >>> 1) the first task (container init) will call sys_restart with the >>> checkpoint >>> image file. >>> >>> 2) the code will verify the header, then read in the global section; >>> it will >>> create a restart-context which will be referenced from the >>> container-object >>> (one option we considered is to have the freezer-cgroup be that object). >>> >>> 3) using the info from that section, it will create the task tree >>> (forest) >>> to be restored. In particular, new tasks will be created and each will >>> end >>> up in do_restart_task() inside the kernel. >>> >>> [note that in Zap, step 3 is still done in user space...] >>> >>> Since all tasks live in the container, they will all have access to the >>> restart-context, through which all coordination is done. >>> >>> At first, the restart will be performed _one task at a time_, at the >>> order >>> they were dumped. So while the init task restores itself, the remaining >>> tasks sleep. When the init task finishes - it will wake the next in line >>> and so on. The last one will wake the init task to finalize the work. So: >>> >>> 4) each task waits (sleeps) until it is prompted to restore its own >>> state. >>> When it completes, it wakes up the next task in line and goes to a freeze >>> state. >>> >>> 5) the init task finalized the restart, and either completes the >>> freeze or >>> unfreezes the container, depending on what the user requested. >>> >>> This scheme makes sense because we assume that the data is streamed. >>> So it >>> does not make much sense to try to restart the 5th job before the 2nd job >>> because the data isn't there yet. Moreover, if they refer to the same >>> shared >>> object, job#5 will have to wait to job#2 to create the object, since its >>> state was saved with that job. >>> >>> In the future, to speed the process by concurrent restarting multiple >>> tasks, >>> we'll have to read in data from the stream into a buffer (read-ahead) and >>> then restarting tasks could skip data that doesn't belongs to them; while >>> they may still need to wait for shared resources to be created, other >>> work >>> can be done in parallel in the meanwhile. >>> >>>> checkpoint/restart context which the first task creates and all tasks >>>> reference and update? So task 5 created its mm_struct, task 6 is >>>> supposed to use the same mm_struct, so it finds that out from the >>>> context? I wonder whether that would start to become complicated >>>> when checkpointing nested containers. >>> Yes, that's what I had in mind - the restart context holds a hash table >>> that references all the shared objects that are created during the >>> restart. >>> (Like the checkpoint context that will hold references to objects that >>> have been inspected). >>> >>> Checkpointing nested containers ??? Why ? >>> I'm not sure why would that be a problem; but sure, we need to discuss >>> that using a concrete use-case and identify the needs and difficulties. >> In the current proposition, we talked about creating an empty container >> and the first process calls sys_restart. With nested container, we have >> to CR the container itself no ? I don't see how we can CR nested >> container otherwise :/ > > Probably so: with nested containers it is necessary to also save the state > of the "container-tree" (which is sort of analogous to task-tree). > In particular, because tasks in nested containers are essentially part > of the outermost container that is being checkpointed. Is this issue > specific to the proposed scheme, or a general issue of any scheme ? I meant an issue with the proposed scheme. How to sys_restart recursively on a pid 1 with nested container if we want to create the container and having the first process calling sys_restart ? But anyway, let's checkpoint a single container before :) > I think that to tackle this, we need to first agree and implement an > object that represents a container (again, the freezer_cgroup ?). Didn't we state on creating a checkpoint/restart control group sub-system to have the context allocated ? >>>> So I still prefer the idea that the init process calls restart, and that >>>> creates all the tasks in the container and rebuilds them. But you have >>>> code, so you win :) >>> I agree: the init task calls restart, and that creates all the tasks in >>> the container. And then, make each of them call do_restart_task() in >>> some way :) >>> >>>> Anyway I'm still reading through patch 2. It looks great to me - the >>>> only comments I have written so far are: >>>> 1. why not just store LINUX_VERSION_CODE in the header instead >>>> of breaking it up >>> hmph ... good question. Avoid 32/64 bit conversion complications ? >>> >>>> 2. the x86-specific code should of course go into arch-specific >>>> directories, but >>> of course. I left it there for simplicity right now. >>> >>>> neither of which really is worth the bother right now imo :) >>>> >>>>> (Actually, to checkpoint outside the context of a task, it is also >>>>> necessary to also handle restart-block logic when saving/restoring the >>>>> thread data). >>>>> >>>>> It takes longer to describe what isn't implemented or supported by >>>>> this prototype ... basically everything that isn't as simple as the >>>>> above. >>>>> >>>>> As for containers - since we still don't have a representation for a >>>>> container, this patch has no notion of a container. The tests for >>>>> consistent namespaces (and isolation) are also omitted. >>>>> >>>>> Below are two example programs: one uses checkpoint (called ckpt) and >>>>> one uses restart (called rstr). Execute like this (as a superuser): >>>>> >>>>> orenl:~/test$ ./ckpt > out.1 >>>>> hello, world! (ret=1) <-- sys_checkpoint returns positive id >>>>> <-- ctrl-c >>>>> orenl:~/test$ ./ckpt > out.2 >>>>> hello, world! (ret=2) >>>>> <-- ctrl-c >>>>> orenl:~/test$ ./rstr < out.1 >>>>> hello, world! (ret=0) <-- sys_restart return 0 >>>>> >>>>> (if you check the output of ps, you'll see that "rstr" changed its >>>>> name to "ckpt", as expected). >>>>> >>>>> Hoping this will accelerate the discussion. Comments are welcome. >>>>> Let the fun begin :) >>>>> >>>>> Oren. >>>>> >>>>> >>>>> ============================== ckpt.c ================================ >>>>> >>>>> #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ >>>>> >>>>> #include <stdio.h> >>>>> #include <stdlib.h> >>>>> #include <errno.h> >>>>> #include <fcntl.h> >>>>> #include <unistd.h> >>>>> #include <asm/unistd_32.h> >>>>> #include <sys/syscall.h> >>>>> >>>>> int main(int argc, char *argv[]) >>>>> { >>>>> pid_t pid = getpid(); >>>>> int ret; >>>>> >>>>> ret = syscall(__NR_checkpoint, pid, STDOUT_FILENO, 0); >>>>> if (ret < 0) >>>>> perror("checkpoint"); >>>>> >>>>> fprintf(stderr, "hello, world! (ret=%d)\n", ret); >>>>> >>>>> while (1) >>>>> ; >>>>> >>>>> return 0; >>>>> } >>>>> >>>>> ============================== rstr.c ================================ >>>>> >>>>> #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ >>>>> >>>>> #include <stdio.h> >>>>> #include <stdlib.h> >>>>> #include <errno.h> >>>>> #include <fcntl.h> >>>>> #include <unistd.h> >>>>> #include <asm/unistd_32.h> >>>>> #include <sys/syscall.h> >>>>> >>>>> int main(int argc, char *argv[]) >>>>> { >>>>> pid_t pid = getpid(); >>>>> int ret; >>>>> >>>>> ret = syscall(__NR_restart, pid, STDIN_FILENO, 0); >>>>> if (ret < 0) >>>>> perror("restart"); >>>>> >>>>> printf("should not reach here !\n"); >>>>> >>>>> return 0; >>>>> } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <Pine.LNX.4.64.0807292306570.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> 2008-07-30 21:35 ` Serge E. Hallyn @ 2008-07-30 22:16 ` Serge E. Hallyn 2008-07-31 1:11 ` [Devel] " Andrey Mirkin 2 siblings, 0 replies; 14+ messages in thread From: Serge E. Hallyn @ 2008-07-30 22:16 UTC (permalink / raw) To: Oren Laadan; +Cc: Linux Containers Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org): > > In the recent mini-summit at OLS 2008 and the following days it was > agreed to tackle the checkpoint/restart (CR) by beginning with a very > simple case: save and restore a single task, with simple memory > layout, disregarding other task state such as files, signals etc. > > Following these discussions I coded a prototype that can do exactly > that, as a starter. This code adds two system calls - sys_checkpoint > and sys_restart - that a task can call to save and restore its state > respectively. It also demonstrates how the checkpoint image file can > be formatted, as well as show its nested nature (e.g. cr_write_mm() > -> cr_write_vma() nesting). > > The state that is saved/restored is the following: > * some of the task_struct > * some of the thread_struct and thread_info > * the cpu state (including FPU) > * the memory address space > > [The patch is against commit fb2e405fc1fc8b20d9c78eaa1c7fd5a297efde43 > of Linus's tree (uhhh.. don't ask why), but against tonight's head too]. > > In the current code, sys_checkpoint will checkpoint the current task, > although the logic exists to checkpoint other tasks (not in the > checkpointee's execution context). A simple loop will extend this to > handle multiple processes. sys_restart restarts the current tasks, and > with multiple tasks each task will call the syscall independently. > (Actually, to checkpoint outside the context of a task, it is also > necessary to also handle restart-block logic when saving/restoring the > thread data). > > It takes longer to describe what isn't implemented or supported by > this prototype ... basically everything that isn't as simple as the > above. > > As for containers - since we still don't have a representation for a > container, this patch has no notion of a container. The tests for > consistent namespaces (and isolation) are also omitted. > > Below are two example programs: one uses checkpoint (called ckpt) and > one uses restart (called rstr). Execute like this (as a superuser): > > orenl:~/test$ ./ckpt > out.1 > hello, world! (ret=1) <-- sys_checkpoint returns positive id > <-- ctrl-c > orenl:~/test$ ./ckpt > out.2 > hello, world! (ret=2) > <-- ctrl-c > orenl:~/test$ ./rstr < out.1 > hello, world! (ret=0) <-- sys_restart return 0 > > (if you check the output of ps, you'll see that "rstr" changed its > name to "ckpt", as expected). > > Hoping this will accelerate the discussion. Comments are welcome. > Let the fun begin :) Compile, boot, and c/r-tested on my f9 kvm image. -serge ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Devel] [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <Pine.LNX.4.64.0807292306570.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> 2008-07-30 21:35 ` Serge E. Hallyn 2008-07-30 22:16 ` Serge E. Hallyn @ 2008-07-31 1:11 ` Andrey Mirkin [not found] ` <200807310511.11648.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 2 siblings, 1 reply; 14+ messages in thread From: Andrey Mirkin @ 2008-07-31 1:11 UTC (permalink / raw) To: devel-GEFAQzZX7r8dnm+yROfE0A; +Cc: Linux Containers Hello Oren, That is great, that you have proposed your version of checkpointing/restart. In a few days I will send a patchset with OpenVZ checkpointing/restart. So, we will be able to compare our approaches and take the best parts from both. Regards, Andrey On Wednesday 30 July 2008 07:24 Oren Laadan wrote: > In the recent mini-summit at OLS 2008 and the following days it was > agreed to tackle the checkpoint/restart (CR) by beginning with a very > simple case: save and restore a single task, with simple memory > layout, disregarding other task state such as files, signals etc. > > Following these discussions I coded a prototype that can do exactly > that, as a starter. This code adds two system calls - sys_checkpoint > and sys_restart - that a task can call to save and restore its state > respectively. It also demonstrates how the checkpoint image file can > be formatted, as well as show its nested nature (e.g. cr_write_mm() > -> cr_write_vma() nesting). > > The state that is saved/restored is the following: > * some of the task_struct > * some of the thread_struct and thread_info > * the cpu state (including FPU) > * the memory address space > > [The patch is against commit fb2e405fc1fc8b20d9c78eaa1c7fd5a297efde43 > of Linus's tree (uhhh.. don't ask why), but against tonight's head too]. > > In the current code, sys_checkpoint will checkpoint the current task, > although the logic exists to checkpoint other tasks (not in the > checkpointee's execution context). A simple loop will extend this to > handle multiple processes. sys_restart restarts the current tasks, and > with multiple tasks each task will call the syscall independently. > (Actually, to checkpoint outside the context of a task, it is also > necessary to also handle restart-block logic when saving/restoring the > thread data). > > It takes longer to describe what isn't implemented or supported by > this prototype ... basically everything that isn't as simple as the > above. > > As for containers - since we still don't have a representation for a > container, this patch has no notion of a container. The tests for > consistent namespaces (and isolation) are also omitted. > > Below are two example programs: one uses checkpoint (called ckpt) and > one uses restart (called rstr). Execute like this (as a superuser): > > orenl:~/test$ ./ckpt > out.1 > hello, world! (ret=1) <-- sys_checkpoint returns positive id > <-- ctrl-c > orenl:~/test$ ./ckpt > out.2 > hello, world! (ret=2) > <-- ctrl-c > orenl:~/test$ ./rstr < out.1 > hello, world! (ret=0) <-- sys_restart return 0 > > (if you check the output of ps, you'll see that "rstr" changed its > name to "ckpt", as expected). > > Hoping this will accelerate the discussion. Comments are welcome. > Let the fun begin :) > > Oren. > > > ============================== ckpt.c ================================ > > #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ > > #include <stdio.h> > #include <stdlib.h> > #include <errno.h> > #include <fcntl.h> > #include <unistd.h> > #include <asm/unistd_32.h> > #include <sys/syscall.h> > > int main(int argc, char *argv[]) > { > pid_t pid = getpid(); > int ret; > > ret = syscall(__NR_checkpoint, pid, STDOUT_FILENO, 0); > if (ret < 0) > perror("checkpoint"); > > fprintf(stderr, "hello, world! (ret=%d)\n", ret); > > while (1) > ; > > return 0; > } > > ============================== rstr.c ================================ > > #define _GNU_SOURCE /* or _BSD_SOURCE or _SVID_SOURCE */ > > #include <stdio.h> > #include <stdlib.h> > #include <errno.h> > #include <fcntl.h> > #include <unistd.h> > #include <asm/unistd_32.h> > #include <sys/syscall.h> > > int main(int argc, char *argv[]) > { > pid_t pid = getpid(); > int ret; > > ret = syscall(__NR_restart, pid, STDIN_FILENO, 0); > if (ret < 0) > perror("restart"); > > printf("should not reach here !\n"); > > return 0; > } > _______________________________________________ > Containers mailing list > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org > https://lists.linux-foundation.org/mailman/listinfo/containers > > _______________________________________________ > Devel mailing list > Devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org > https://openvz.org/mailman/listinfo/devel ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <200807310511.11648.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [Devel] [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <200807310511.11648.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2008-07-31 21:28 ` Serge E. Hallyn [not found] ` <20080731212810.GB7858-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Serge E. Hallyn @ 2008-07-31 21:28 UTC (permalink / raw) To: Andrey Mirkin; +Cc: Linux Containers Quoting Andrey Mirkin (major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org): > Hello Oren, > > That is great, that you have proposed your version of checkpointing/restart. > In a few days I will send a patchset with OpenVZ checkpointing/restart. > So, we will be able to compare our approaches and take the best parts from > both. Excellent, looking forward to it! Are you going to stick to the same limitations as Oren did? (I think it would be best) -serge ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20080731212810.GB7858-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [Devel] [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <20080731212810.GB7858-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-08-01 5:28 ` Andrey Mirkin [not found] ` <200808010928.21220.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Andrey Mirkin @ 2008-08-01 5:28 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: Linux Containers On Friday 01 August 2008 01:28 Serge E. Hallyn wrote: > Quoting Andrey Mirkin (major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org): > > Hello Oren, > > > > That is great, that you have proposed your version of > > checkpointing/restart. In a few days I will send a patchset with OpenVZ > > checkpointing/restart. So, we will be able to compare our approaches and > > take the best parts from both. > > Excellent, looking forward to it! Are you going to stick to the same > limitations as Oren did? (I think it would be best) Yes, I'll send patches which are able to checkpoint/restart just a single, simple process. Andrey ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <200808010928.21220.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [Devel] [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <200808010928.21220.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2008-08-21 21:37 ` Serge E. Hallyn [not found] ` <20080821213724.GA17862-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Serge E. Hallyn @ 2008-08-21 21:37 UTC (permalink / raw) To: Andrey Mirkin; +Cc: Linux Containers Quoting Andrey Mirkin (major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org): > On Friday 01 August 2008 01:28 Serge E. Hallyn wrote: > > Quoting Andrey Mirkin (major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org): > > > Hello Oren, > > > > > > That is great, that you have proposed your version of > > > checkpointing/restart. In a few days I will send a patchset with OpenVZ > > > checkpointing/restart. So, we will be able to compare our approaches and > > > take the best parts from both. > > > > Excellent, looking forward to it! Are you going to stick to the same > > limitations as Oren did? (I think it would be best) > Yes, I'll send patches which are able to checkpoint/restart just a single, > simple process. Hi Andrey, don't mean to be pushy, but do you have an estimate for when you might get that patchset out? thanks, -serge ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20080821213724.GA17862-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [Devel] [RFC][PATCH 0/2] CR: save/restore a single, simple task [not found] ` <20080821213724.GA17862-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-09-03 11:10 ` Andrey Mirkin 0 siblings, 0 replies; 14+ messages in thread From: Andrey Mirkin @ 2008-09-03 11:10 UTC (permalink / raw) To: devel-GEFAQzZX7r8dnm+yROfE0A; +Cc: Linux Containers On Friday 22 August 2008 01:37 Serge E. Hallyn wrote: > Quoting Andrey Mirkin (major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org): > > On Friday 01 August 2008 01:28 Serge E. Hallyn wrote: > > > Quoting Andrey Mirkin (major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org): > > > > Hello Oren, > > > > > > > > That is great, that you have proposed your version of > > > > checkpointing/restart. In a few days I will send a patchset with > > > > OpenVZ checkpointing/restart. So, we will be able to compare our > > > > approaches and take the best parts from both. > > > > > > Excellent, looking forward to it! Are you going to stick to the same > > > limitations as Oren did? (I think it would be best) > > > > Yes, I'll send patches which are able to checkpoint/restart just a > > single, simple process. > > Hi Andrey, > > don't mean to be pushy, but do you have an estimate for when you might > get that patchset out? > Hi Serge, I've just sent OpenVZ kernel based checkpointing/restart. I was on vacation and unfortunately my laptop power adapter died when I was far away from home :( Sorry for such a huge delay, I should sent my patchset before vacation... Shame on me. Andrey ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-09-03 11:10 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-30 3:24 [RFC][PATCH 0/2] CR: save/restore a single, simple task Oren Laadan
[not found] ` <Pine.LNX.4.64.0807292306570.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-07-30 21:35 ` Serge E. Hallyn
[not found] ` <20080730213541.GA24192-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-30 21:40 ` Dave Hansen
2008-07-31 0:37 ` Oren Laadan
2008-07-30 23:46 ` Oren Laadan
[not found] ` <4890FD57.7050601-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 11:23 ` Daniel Lezcano
[not found] ` <4891A0C4.5080906-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-07-31 15:25 ` Oren Laadan
[not found] ` <4891D962.3020407-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 17:15 ` Daniel Lezcano
2008-07-30 22:16 ` Serge E. Hallyn
2008-07-31 1:11 ` [Devel] " Andrey Mirkin
[not found] ` <200807310511.11648.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-07-31 21:28 ` Serge E. Hallyn
[not found] ` <20080731212810.GB7858-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-08-01 5:28 ` Andrey Mirkin
[not found] ` <200808010928.21220.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-08-21 21:37 ` Serge E. Hallyn
[not found] ` <20080821213724.GA17862-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-09-03 11:10 ` Andrey Mirkin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.