All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [c/r]A problem met when using linux c/r
       [not found]   ` <4AE642D7.1070801-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-10-27  2:21     ` Liu Aleaxander
       [not found]       ` <b741c2440910261921q77e0b5bbqbc0ee1f74974ee5e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Liu Aleaxander @ 2009-10-27  2:21 UTC (permalink / raw)
  To: Oren Laadan,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Oct 27, 2009 at 8:46 AM, Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> wrote:

> Hi,
>
> Thanks for your report !
>
> Liu Aleaxander wrote:
> > Hi orenl:
> >
> > I met a problem when trying to use these samples given in
> > Documentation/checkpoint. I just followed the instructions in usage.txt
> > to do a self-checkpoint, but with no luck, it failed. Here is a dump of
> > /tmp/cr-test.out and self.image:
> >
> >     $ cat /tmp/cr-test.out
> >     hello, world (80.86)!
> >     count 0 (80.86)!
> >     count 1 (80.96)!
> >     count 2 (81.16)!
> >     ckpt: Invalid argument(-22)
> >
> >     $cat self.image
> >     '[err -22]: not container init
> >
> >
> > Then I searched the Internet and read the readme.txt again, found I may
> > should compiled the 'container' in Kernel, then following the
> > instructions in the lxc main page [http://lxc.sourceforge.net/lxc.html],
> > I recompiled the kernel, to make the 'container' contained in the
> > Kernel, but with no luck, it's still the same; can't work, and with the
> > same error. I am still thinking it's a problem of the container. But I
> > don't know why and how to fix it. So, can you please tell me where is
> > wrong? And how can I use the checkpoint/restart correctly. Thanks!
>
> The problem is in the sample code :(  In particular, it should use
> the CHECKPOINT_SUBTREE flag as an argument to the syscall, rather
> than pass a '0'.
>
> (See also the example in contrib/ directory in user-cr).
>

I checked it again(BTW, I found some new typos, too; I'll patch it later),
but it didn't work either. while, at least, it succeed in checkpointing, but
failed in restarting. A error statement followed just by the restart
command:
$ ./self_restart < self.image
Killed

And here is a small dump of dmesg:
[4959:4959:c/r:ckpt_read_obj:367] type 1 len 72(72,72)
[4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73)
[4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73)
[4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73)
[4959:4959:c/r:ckpt_read_obj:367] type 2 len 16(16,16)
[4959:4959:c/r:do_restore_coord:1176] restore header: 0
[4959:4959:c/r:ckpt_read_obj:367] type 3 len 8(8,8)
[4959:4959:c/r:do_restore_coord:1180] restore container: 0
[4959:4959:c/r:ckpt_read_obj:367] type 101 len 16(16,16)
[4959:4959:c/r:_ckpt_read_obj:259] type 4 len 32(32,32)
[4959:4959:c/r:do_restore_coord:1184] restore tree: 24
[4959:4959:c/r:do_restore_coord:1218] pre restore task: 0
[4959:4959:c/r:ckpt_read_obj:367] type 102 len 64(64,64)
[4959:4959:c/r:_ckpt_read_obj:259] type 5 len 24(24,24)
[4959:4959:c/r:restore_task:879] task 0
[4959:4959:c/r:do_restore_coord:1222] restore task: -22
[4959:4959:c/r:walk_task_subtree:338] total 0 ret 0
[4959:4959:c/r:clear_task_ctx:763] task 4959 clear checkpoint_ctx
[4959:4959:c/r:do_restart:1347] restart err -22, exiting
[4959:4959:c/r:do_restart:1354] sys_restart returns -22
[4959:4959:c/r:restore_debug_free:141] 1 tasks registered, nr_tasks was 0
nr_total 0
[4959:4959:c/r:restore_debug_free:144] active pid was -1, ctx->errno -22
[4959:4959:c/r:restore_debug_free:146] kflags 10 uflags 1 oflags 1
[4959:4959:c/r:restore_debug_free:173] pid 4959 type Coord state Failed


> >
> >
> > BTW,  I'm not sure the output of /tmp/cr-test.out in usage.txt is right.
> > Here is the output(from usage.txt):
> >     $ cat /tmp/cr-rest.out
> >     hello, world (85.46)!
> >     count 0 (85.46)!
> >     count 1 (85.56)!
> >     count 2 (85.76)!
> >     count 3 (86.46)!
> >
> > I think between count2 and count3, there is a statement more: like
> > "checkpoint ret: 1" or others, since at the 2nd loop, it will call
> > checkpoint, and will print the ret in that file.
>
> Good catch...
>
> I pushed fixes to both issues to the git repository.
>
> Thanks,
>
> Oren.
>
> (p.s. please CC the containers mailing list in the future).
>
>


-- 
regards
Liu Aleaxander

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [c/r]A problem met when using linux c/r
       [not found]       ` <b741c2440910261921q77e0b5bbqbc0ee1f74974ee5e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-10-27  3:14         ` Liu Aleaxander
  2009-10-27 13:47         ` Oren Laadan
  1 sibling, 0 replies; 3+ messages in thread
From: Liu Aleaxander @ 2009-10-27  3:14 UTC (permalink / raw)
  To: Oren Laadan,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, Oct 27, 2009 at 10:21 AM, Liu Aleaxander <aleaxander-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote:

>
> And here is a small dump of dmesg:
>
> Oh, sorry, the dump is old(I hadn't complied the kernel with the new patch
yet); here is the new one:
[2880:2880:c/r:ckpt_read_obj:370] type 1 len 72(72,72)
[2880:2880:c/r:_ckpt_read_obj:262] type 4 len 73(73,73)
[2880:2880:c/r:_ckpt_read_obj:262] type 4 len 73(73,73)
[2880:2880:c/r:_ckpt_read_obj:262] type 4 len 73(73,73)
[2880:2880:c/r:ckpt_read_obj:370] type 2 len 16(16,16)
[2880:2880:c/r:do_restore_coord:1185] restore header: 0
[2880:2880:c/r:ckpt_read_obj:370] type 3 len 8(8,8)
[2880:2880:c/r:do_restore_coord:1189] restore container: 0
[2880:2880:c/r:ckpt_read_obj:370] type 101 len 16(16,16)
[2880:2880:c/r:_ckpt_read_obj:262] type 4 len 32(32,32)
[2880:2880:c/r:do_restore_coord:1193] restore tree: 24
[2880:2880:c/r:do_restore_coord:1227] pre restore task: 0
[2880:2880:c/r:ckpt_read_obj:370] type 102 len 64(64,64)
[2880:2880:c/r:_ckpt_read_obj:262] type 5 len 24(24,24)
[2880:2880:c/r:restore_task:879] task 0
[2880:2880:c/r:do_restore_coord:1231] restore task: -22
[2880:2880:c/r:walk_task_subtree:338] total 0 ret 0
[2880:2880:c/r:clear_task_ctx:772] task 2880 clear checkpoint_ctx
[2880:2880:c/r:do_restart:1367] restart err -22, exiting
[2880:2880:c/r:do_restart:1374] sys_restart returns -22
[2880:2880:c/r:restore_debug_free:141] 1 tasks registered, nr_tasks was 0
nr_total 0
[2880:2880:c/r:restore_debug_free:144] active pid was -1, ctx->errno -22
[2880:2880:c/r:restore_debug_free:146] kflags 10 uflags 1 oflags 1
[2880:2880:c/r:restore_debug_free:173] pid 2880 type Coord state Failed



-- 
regards
Liu Aleaxander

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [c/r]A problem met when using linux c/r
       [not found]       ` <b741c2440910261921q77e0b5bbqbc0ee1f74974ee5e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-10-27  3:14         ` Liu Aleaxander
@ 2009-10-27 13:47         ` Oren Laadan
  1 sibling, 0 replies; 3+ messages in thread
From: Oren Laadan @ 2009-10-27 13:47 UTC (permalink / raw)
  To: Liu Aleaxander; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Liu,


Liu Aleaxander wrote:
> I checked it again(BTW, I found some new typos, too; I'll patch it later),
> but it didn't work either. while, at least, it succeed in checkpointing, but
> failed in restarting. A error statement followed just by the restart
> command:
> $ ./self_restart < self.image
> Killed
> 
> And here is a small dump of dmesg:
> [4959:4959:c/r:ckpt_read_obj:367] type 1 len 72(72,72)
> [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73)
> [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73)
> [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 73(73,73)
> [4959:4959:c/r:ckpt_read_obj:367] type 2 len 16(16,16)
> [4959:4959:c/r:do_restore_coord:1176] restore header: 0
> [4959:4959:c/r:ckpt_read_obj:367] type 3 len 8(8,8)
> [4959:4959:c/r:do_restore_coord:1180] restore container: 0
> [4959:4959:c/r:ckpt_read_obj:367] type 101 len 16(16,16)
> [4959:4959:c/r:_ckpt_read_obj:259] type 4 len 32(32,32)
> [4959:4959:c/r:do_restore_coord:1184] restore tree: 24
> [4959:4959:c/r:do_restore_coord:1218] pre restore task: 0
> [4959:4959:c/r:ckpt_read_obj:367] type 102 len 64(64,64)
> [4959:4959:c/r:_ckpt_read_obj:259] type 5 len 24(24,24)
> [4959:4959:c/r:restore_task:879] task 0
> [4959:4959:c/r:do_restore_coord:1222] restore task: -22
> [4959:4959:c/r:walk_task_subtree:338] total 0 ret 0
> [4959:4959:c/r:clear_task_ctx:763] task 4959 clear checkpoint_ctx
> [4959:4959:c/r:do_restart:1347] restart err -22, exiting
> [4959:4959:c/r:do_restart:1354] sys_restart returns -22
> [4959:4959:c/r:restore_debug_free:141] 1 tasks registered, nr_tasks was 0
> nr_total 0
> [4959:4959:c/r:restore_debug_free:144] active pid was -1, ctx->errno -22
> [4959:4959:c/r:restore_debug_free:146] kflags 10 uflags 1 oflags 1
> [4959:4959:c/r:restore_debug_free:173] pid 4959 type Coord state Failed
> 

Please try this patch:

commit 7a7048d9ec8d9f74e7521eb9756d24f24767a024
Author: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Date:   Tue Oct 27 09:42:28 2009 -0400

    c/r: self-restart to tolerate missing pgid

    In self-restart we don't generate ghost tasks. Instead we permit
    undefined pgid - tolerate inability to restore the pgid of the
    restarting process.

    Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>

diff --git a/checkpoint/process.c b/checkpoint/process.c
index 6b2ef4c..8e4a823 100644
--- a/checkpoint/process.c
+++ b/checkpoint/process.c
@@ -823,6 +823,10 @@ static int restore_task_pgid(struct ckpt_ctx *ctx)
 	}
 	write_unlock_irq(&tasklist_lock);

+	/* self-restart: be tolerant if old pgid isn't found */
+	if (ctx->uflags & RESTART_TASKSELF)
+		ret = 0;
+
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-10-27 13:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <b741c2440910260416r5d045024wdb5a68b2311561db@mail.gmail.com>
     [not found] ` <4AE642D7.1070801@cs.columbia.edu>
     [not found]   ` <4AE642D7.1070801-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-10-27  2:21     ` [c/r]A problem met when using linux c/r Liu Aleaxander
     [not found]       ` <b741c2440910261921q77e0b5bbqbc0ee1f74974ee5e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-27  3:14         ` Liu Aleaxander
2009-10-27 13:47         ` Oren Laadan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.