From: Oren Laadan <orenl@cs.columbia.edu>
To: Jiro SEKIBA <jir@dependable-os.net>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>,
"containers@lists.linux-foundation.org"
<containers@lists.linux-foundation.org>,
Linux-Kernel <linux-kernel@vger.kernel.org>
Subject: Re: Linux Checkpoint-Restart - v19
Date: Fri, 19 Mar 2010 11:34:09 -0400 [thread overview]
Message-ID: <4BA39971.2080402@cs.columbia.edu> (raw)
In-Reply-To: <EF179F3A-4FBA-4776-B7A4-48F5EF73DC9C@dependable-os.net>
Jiro SEKIBA wrote:
> Hi,
> On 2010/03/18, at 5:55, Serge E. Hallyn wrote:
>
>> Quoting Jiro SEKIBA (jir@dependable-os.net):
>>> Hi,
>>>
>>> Thank you for prompt reply!
>>> Sorry that I didn't post to containers@lists.linux-foundation.org.
>>>
>>> On 2010/03/16, at 7:55, Oren Laadan wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks for taking the time to evaluate c/r. You may want to also
>>>> try the latest, which is (as of now) ckpt-v20-rc2.
>>> Yeah, I'll eventually try to keep up with the latest,
>>> but I just want to try the one you think it's stable first anyway.
>>>
>>>> In the future, please CC the containers mailing list for issues
>>>> related to c/r, at "containers@lists.linux-foundation.org".
>>>>
>>>> Jiro SEKIBA wrote:
>>>>> Hi,
>>>>> I'm trying to evaluate external checkpoint/restart with cr-v19 kernel.
>>>>> However, when I restart, I got "Killed" message in stdout.
>>>>> Do you have any tips or clue that are not in
>>>>> Documentation/checkpoint/usage.txt ?
>>>>> I'm using kernel pulled from
>>>>> git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git .
>>>>> checkout tag named "ckpt-v19". Base distro is ubuntu 9.10.
>>>>> I ran self checkpioint/restart sample program in Documentation/checkpint.
>>>>> It works as written in usage.txt.
>>>>> However, I can not make external checkpint/restart work properly.
>>>>> I made a simple test program bellow and create checkpoint externally using
>>>>> the program in Documentation/checkpoint/, it looks checkpoint file is
>>>>> created properly.
>>>>> However, when I ran self_restart < ckpt.image, I got "Killed" message.
>>>> If you take an external checkpoint, then you need to match it
>>>> with an external restart, as opposed to self_restart.
>>>>
>>>> Otherwise, restarting with self_restart from a checkpoint that is
>>>> not a self-checkpoint can yield unexpected results.
>>>>
>>>> Since you don't mention in your post, I don't know if you are using
>>>> the tools from user-cr. If not, then you should use 'checkpoint' and
>>>> 'restart' tools from there. It is available from:
>>>> git://git.ncl.cs.columbia.edu/pub/git/user-cr.git
>>>> (use the same branch as the one you used to linux-cr).
>>>>
>>>> Once you have the tools compiled, and you checkpoint with the
>>>> 'checkpoint' utility from there, you can restart with:
>>>> restart -v < ckpt.image
>>>>
>>> Thank you for the information.
>>> Actually I was trying to create checkpoint in Document/checkpints.
>>>
>>> Now, I tried with user-cr, compiled binary in the same tag (ckpt-v19).
>>> Creating checkpoint looks OK and restart -v shows it Success. nice!
>>> However, the contents in /tmp/test.out never get further,
>>> it remains same as when created checkpoint.
>>>
>>> I tried "./restart -F /cgroup/0 -v --no-pidns < ckpt.image", got Success.
>>> cat /cgroup/0/tasks tells that there is a process.
>>> ps shows ./test. So, it looks restarting.
>>>
>>> # ps axuww |grep $(cat /cgroup/0/tasks )
>>> root 7231 0.1 0.0 1588 64 pts/0 D 16:57 0:00 ./test
>>> root 7238 0.0 0.1 2716 660 pts/1 R+ 16:57 0:00 grep 7231
>>>
>>> under the /proc, one file descriptor opened, and it is /tmp/test.out
>>>
>>> # ls -l /proc/$(cat /cgroup/0/tasks)/fd
>>> total 0
>>> lrwx------ 1 root root 64 Mar 16 16:58 0 -> /tmp/test.out
>>>
>>> Nhh, it's close..
>>>
>>> I found that when I mount cgroup with -o freezer, self_checkpoint won't work.
>>> It worked even I didn't mount the cgroup.
>>> Is it what you expect?
>> No, it is not. Can you tell us more about exactly how it fails?
>>
>
> OK, I've checked differences of dmesg when self_restart does well and doesn't.
> When it goes well, the filename is /tmp/cr-self.out
>
> [ 401.522556] [2307:2307:c/r:ckpt_read_fname:571] read filename '/tmp/cr-self.out'
> [ 401.522558] [2307:2307:c/r:restore_open_fname:594] fname '/tmp/cr-self.out' flags 0x2
This means that restart wants to re-open the file /tmp/cr-self.out.
>
> However, when the contents of file remains, filename is /tmp/cr-self.out.org,
> which is , of course, the one of original file binding to the original process.
>
> [ 1088.414250] [2951:2951:c/r:ckpt_read_fname:571] read filename '/tmp/cr-self.out.orig'
> [ 1088.414253] [2951:2951:c/r:restore_open_fname:594] fname '/tmp/cr-self.out.orig' flags 0x2
This means that restart wants to re-open the file /tmp/cr-self.out.org.
Could it be that these two restart attempts use two distinct image files
as input ?
The first one seems to correspond to something like:
1) start the test, 2) checkpoint, 3) mv file and cp file, 4) restart
The second one seems to correspond to something like:
1) start the test, 2) mv file and ctp file, 3) checkpoint, 4) restart
What is the actual error reported when it doesn't work ? (from restart
and from the kernel log)
>
> I can not reproduce yet, but at least cgroup freezer option won't affect like I mentioned.
> Sorry that it might confuse you.
>
> I still can not restart of external checkpoint.
> I'll try to v20 next time.
If it doesn't work, can you please describe again the exact order of
commands that you use and the reported error(s) ?
Oren.
>
>> Maybe get the cr_tests (either from Oren's tree or from
>> git clone git://git.sr71.net/~hallyn/cr_tests.git), cd cr_test,
>> make, cd simple, run ./ckpt and send us the contents of
>> /tmp/log, dmesg, and ckptinfo -ve /tmp/out ?
>
> I think it runs OK, but send it in case.
> /tmp/log was empty by the way.
>
> thanks
>
>>> Thank you again for the help!
>>> I'm feeling better to use the latest ..
>> -serge
next prev parent reply other threads:[~2010-03-19 15:34 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-22 23:17 Linux Checkpoint-Restart - v19 Oren Laadan
2010-02-22 23:17 ` Oren Laadan
2010-03-01 21:36 ` Andrew Morton
2010-03-01 22:56 ` Oren Laadan
[not found] ` <20100301133623.9808986f.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-03-01 22:56 ` Oren Laadan
[not found] ` <4B83106C.7040203-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-01 21:36 ` Andrew Morton
2010-03-15 8:55 ` Jiro SEKIBA
[not found] ` <a1c54a921003150155q4a0c7fc1vb02ba0464b07f452-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-03-15 22:55 ` Oren Laadan
2010-03-15 22:55 ` Oren Laadan
[not found] ` <4B9EBAF2.1060304-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-16 8:36 ` Jiro SEKIBA
2010-03-16 8:36 ` Jiro SEKIBA
2010-03-17 20:55 ` Serge E. Hallyn
[not found] ` <20100317205556.GA20750-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-03-19 13:14 ` Jiro SEKIBA
2010-03-19 13:14 ` Jiro SEKIBA
2010-03-19 15:34 ` Oren Laadan [this message]
[not found] ` <4BA39971.2080402-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-23 10:53 ` Jiro SEKIBA
2010-03-23 10:53 ` Jiro SEKIBA
[not found] ` <FF5CB8EA-436D-4685-B7A2-946A83DF3F78-Xy3Dp9s2+bNGIRItUzBvX16hYfS7NtTn@public.gmane.org>
2010-03-24 16:47 ` Serge E. Hallyn
[not found] ` <20100324164758.GA21021-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-03-29 8:52 ` Jiro SEKIBA
[not found] ` <BC2CC354-59BA-465A-A863-0CDCD921A99A-Xy3Dp9s2+bNGIRItUzBvX16hYfS7NtTn@public.gmane.org>
2010-03-30 3:05 ` Serge E. Hallyn
[not found] ` <20100330030535.GA13362-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-04-03 9:03 ` Jiro SEKIBA
[not found] ` <18557515-762E-4EE6-90D7-C8F782E487B2-Xy3Dp9s2+bNGIRItUzBvX16hYfS7NtTn@public.gmane.org>
2010-04-05 14:06 ` Serge E. Hallyn
[not found] ` <20100405140629.GG32049-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-04-05 14:31 ` Matt Helsley
[not found] ` <20100405143157.GX3345-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-04-06 2:54 ` Jiro SEKIBA
[not found] ` <39FCECBC-BFE3-4328-BCFC-CBACA3CB442E-Xy3Dp9s2+bNGIRItUzBvX16hYfS7NtTn@public.gmane.org>
2010-04-06 21:49 ` Nathan Lynch
2010-04-06 22:23 ` Serge E. Hallyn
2010-04-07 13:08 ` Jiro SEKIBA
[not found] ` <EF179F3A-4FBA-4776-B7A4-48F5EF73DC9C-Xy3Dp9s2+bNGIRItUzBvX16hYfS7NtTn@public.gmane.org>
2010-03-19 15:34 ` Oren Laadan
[not found] ` <0B4E8136-FFC6-490D-B04A-23A6E1A924FF-Xy3Dp9s2+bNGIRItUzBvX16hYfS7NtTn@public.gmane.org>
2010-03-17 20:55 ` Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BA39971.2080402@cs.columbia.edu \
--to=orenl@cs.columbia.edu \
--cc=containers@lists.linux-foundation.org \
--cc=jir@dependable-os.net \
--cc=linux-kernel@vger.kernel.org \
--cc=serue@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.