* Re: [c/r]A problem met when using linux c/r [not found] ` <4AE642D7.1070801-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2009-10-27 2:21 ` Liu Aleaxander [not found] ` <b741c2440910261921q77e0b5bbqbc0ee1f74974ee5e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: Liu Aleaxander @ 2009-10-27 2:21 UTC (permalink / raw) To: Oren Laadan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA On Tue, Oct 27, 2009 at 8:46 AM, Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> wrote: > Hi, > > Thanks for your report ! > > Liu Aleaxander wrote: > > Hi orenl: > > > > I met a problem when trying to use these samples given in > > Documentation/checkpoint. I just followed the instructions in usage.txt > > to do a self-checkpoint, but with no luck, it failed. Here is a dump of > > /tmp/cr-test.out and self.image: > > > > $ cat /tmp/cr-test.out > > hello, world (80.86)! > > count 0 (80.86)! > > count 1 (80.96)! > > count 2 (81.16)! > > ckpt: Invalid argument(-22) > > > > $cat self.image > > '[err -22]: not container init > > > > > > Then I searched the Internet and read the readme.txt again, found I may > > should compiled the 'container' in Kernel, then following the > > instructions in the lxc main page [http://lxc.sourceforge.net/lxc.html], > > I recompiled the kernel, to make the 'container' contained in the > > Kernel, but with no luck, it's still the same; can't work, and with the > > same error. I am still thinking it's a problem of the container. But I > > don't know why and how to fix it. So, can you please tell me where is > > wrong? And how can I use the checkpoint/restart correctly. Thanks! > > The problem is in the sample code :( In particular, it should use > the CHECKPOINT_SUBTREE flag as an argument to the syscall, rather > than pass a '0'. > > (See also the example in contrib/ directory in user-cr). > I checked it again(BTW, I found some new typos, too; I'll patch it later), but it didn't work either. while, at least, it succeed in checkpointing, but failed in restarting. A error statement followed just by the restart command: $ ./self_restart < self.image Killed And here is a small dump of dmesg: [4959:4959:c/r:ckpt_read_obj:367] type 1 len 72(72,72) [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73) [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73) [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73) [4959:4959:c/r:ckpt_read_obj:367] type 2 len 16(16,16) [4959:4959:c/r:do_restore_coord:1176] restore header: 0 [4959:4959:c/r:ckpt_read_obj:367] type 3 len 8(8,8) [4959:4959:c/r:do_restore_coord:1180] restore container: 0 [4959:4959:c/r:ckpt_read_obj:367] type 101 len 16(16,16) [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 32(32,32) [4959:4959:c/r:do_restore_coord:1184] restore tree: 24 [4959:4959:c/r:do_restore_coord:1218] pre restore task: 0 [4959:4959:c/r:ckpt_read_obj:367] type 102 len 64(64,64) [4959:4959:c/r:_ckpt_read_obj:259] type 5 len 24(24,24) [4959:4959:c/r:restore_task:879] task 0 [4959:4959:c/r:do_restore_coord:1222] restore task: -22 [4959:4959:c/r:walk_task_subtree:338] total 0 ret 0 [4959:4959:c/r:clear_task_ctx:763] task 4959 clear checkpoint_ctx [4959:4959:c/r:do_restart:1347] restart err -22, exiting [4959:4959:c/r:do_restart:1354] sys_restart returns -22 [4959:4959:c/r:restore_debug_free:141] 1 tasks registered, nr_tasks was 0 nr_total 0 [4959:4959:c/r:restore_debug_free:144] active pid was -1, ctx->errno -22 [4959:4959:c/r:restore_debug_free:146] kflags 10 uflags 1 oflags 1 [4959:4959:c/r:restore_debug_free:173] pid 4959 type Coord state Failed > > > > > > BTW, I'm not sure the output of /tmp/cr-test.out in usage.txt is right. > > Here is the output(from usage.txt): > > $ cat /tmp/cr-rest.out > > hello, world (85.46)! > > count 0 (85.46)! > > count 1 (85.56)! > > count 2 (85.76)! > > count 3 (86.46)! > > > > I think between count2 and count3, there is a statement more: like > > "checkpoint ret: 1" or others, since at the 2nd loop, it will call > > checkpoint, and will print the ret in that file. > > Good catch... > > I pushed fixes to both issues to the git repository. > > Thanks, > > Oren. > > (p.s. please CC the containers mailing list in the future). > > -- regards Liu Aleaxander ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <b741c2440910261921q77e0b5bbqbc0ee1f74974ee5e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [c/r]A problem met when using linux c/r [not found] ` <b741c2440910261921q77e0b5bbqbc0ee1f74974ee5e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-10-27 3:14 ` Liu Aleaxander 2009-10-27 13:47 ` Oren Laadan 1 sibling, 0 replies; 3+ messages in thread From: Liu Aleaxander @ 2009-10-27 3:14 UTC (permalink / raw) To: Oren Laadan, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA On Tue, Oct 27, 2009 at 10:21 AM, Liu Aleaxander <aleaxander-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote: > > And here is a small dump of dmesg: > > Oh, sorry, the dump is old(I hadn't complied the kernel with the new patch yet); here is the new one: [2880:2880:c/r:ckpt_read_obj:370] type 1 len 72(72,72) [2880:2880:c/r:_ckpt_read_obj:262] type 4 len 73(73,73) [2880:2880:c/r:_ckpt_read_obj:262] type 4 len 73(73,73) [2880:2880:c/r:_ckpt_read_obj:262] type 4 len 73(73,73) [2880:2880:c/r:ckpt_read_obj:370] type 2 len 16(16,16) [2880:2880:c/r:do_restore_coord:1185] restore header: 0 [2880:2880:c/r:ckpt_read_obj:370] type 3 len 8(8,8) [2880:2880:c/r:do_restore_coord:1189] restore container: 0 [2880:2880:c/r:ckpt_read_obj:370] type 101 len 16(16,16) [2880:2880:c/r:_ckpt_read_obj:262] type 4 len 32(32,32) [2880:2880:c/r:do_restore_coord:1193] restore tree: 24 [2880:2880:c/r:do_restore_coord:1227] pre restore task: 0 [2880:2880:c/r:ckpt_read_obj:370] type 102 len 64(64,64) [2880:2880:c/r:_ckpt_read_obj:262] type 5 len 24(24,24) [2880:2880:c/r:restore_task:879] task 0 [2880:2880:c/r:do_restore_coord:1231] restore task: -22 [2880:2880:c/r:walk_task_subtree:338] total 0 ret 0 [2880:2880:c/r:clear_task_ctx:772] task 2880 clear checkpoint_ctx [2880:2880:c/r:do_restart:1367] restart err -22, exiting [2880:2880:c/r:do_restart:1374] sys_restart returns -22 [2880:2880:c/r:restore_debug_free:141] 1 tasks registered, nr_tasks was 0 nr_total 0 [2880:2880:c/r:restore_debug_free:144] active pid was -1, ctx->errno -22 [2880:2880:c/r:restore_debug_free:146] kflags 10 uflags 1 oflags 1 [2880:2880:c/r:restore_debug_free:173] pid 2880 type Coord state Failed -- regards Liu Aleaxander ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [c/r]A problem met when using linux c/r [not found] ` <b741c2440910261921q77e0b5bbqbc0ee1f74974ee5e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2009-10-27 3:14 ` Liu Aleaxander @ 2009-10-27 13:47 ` Oren Laadan 1 sibling, 0 replies; 3+ messages in thread From: Oren Laadan @ 2009-10-27 13:47 UTC (permalink / raw) To: Liu Aleaxander; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Liu, Liu Aleaxander wrote: > I checked it again(BTW, I found some new typos, too; I'll patch it later), > but it didn't work either. while, at least, it succeed in checkpointing, but > failed in restarting. A error statement followed just by the restart > command: > $ ./self_restart < self.image > Killed > > And here is a small dump of dmesg: > [4959:4959:c/r:ckpt_read_obj:367] type 1 len 72(72,72) > [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73) > [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73) > [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73) > [4959:4959:c/r:ckpt_read_obj:367] type 2 len 16(16,16) > [4959:4959:c/r:do_restore_coord:1176] restore header: 0 > [4959:4959:c/r:ckpt_read_obj:367] type 3 len 8(8,8) > [4959:4959:c/r:do_restore_coord:1180] restore container: 0 > [4959:4959:c/r:ckpt_read_obj:367] type 101 len 16(16,16) > [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 32(32,32) > [4959:4959:c/r:do_restore_coord:1184] restore tree: 24 > [4959:4959:c/r:do_restore_coord:1218] pre restore task: 0 > [4959:4959:c/r:ckpt_read_obj:367] type 102 len 64(64,64) > [4959:4959:c/r:_ckpt_read_obj:259] type 5 len 24(24,24) > [4959:4959:c/r:restore_task:879] task 0 > [4959:4959:c/r:do_restore_coord:1222] restore task: -22 > [4959:4959:c/r:walk_task_subtree:338] total 0 ret 0 > [4959:4959:c/r:clear_task_ctx:763] task 4959 clear checkpoint_ctx > [4959:4959:c/r:do_restart:1347] restart err -22, exiting > [4959:4959:c/r:do_restart:1354] sys_restart returns -22 > [4959:4959:c/r:restore_debug_free:141] 1 tasks registered, nr_tasks was 0 > nr_total 0 > [4959:4959:c/r:restore_debug_free:144] active pid was -1, ctx->errno -22 > [4959:4959:c/r:restore_debug_free:146] kflags 10 uflags 1 oflags 1 > [4959:4959:c/r:restore_debug_free:173] pid 4959 type Coord state Failed > Please try this patch: commit 7a7048d9ec8d9f74e7521eb9756d24f24767a024 Author: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> Date: Tue Oct 27 09:42:28 2009 -0400 c/r: self-restart to tolerate missing pgid In self-restart we don't generate ghost tasks. Instead we permit undefined pgid - tolerate inability to restore the pgid of the restarting process. Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> diff --git a/checkpoint/process.c b/checkpoint/process.c index 6b2ef4c..8e4a823 100644 --- a/checkpoint/process.c +++ b/checkpoint/process.c @@ -823,6 +823,10 @@ static int restore_task_pgid(struct ckpt_ctx *ctx) } write_unlock_irq(&tasklist_lock); + /* self-restart: be tolerant if old pgid isn't found */ + if (ctx->uflags & RESTART_TASKSELF) + ret = 0; + return ret; } ^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-10-27 13:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <b741c2440910260416r5d045024wdb5a68b2311561db@mail.gmail.com>
[not found] ` <4AE642D7.1070801@cs.columbia.edu>
[not found] ` <4AE642D7.1070801-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-10-27 2:21 ` [c/r]A problem met when using linux c/r Liu Aleaxander
[not found] ` <b741c2440910261921q77e0b5bbqbc0ee1f74974ee5e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-27 3:14 ` Liu Aleaxander
2009-10-27 13:47 ` Oren Laadan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.