From: "Serge E. Hallyn" <serue@us.ibm.com>
To: Matt Helsley <matthltc@us.ibm.com>
Cc: Serge Hallyn <serue@linux.vnet.ibm.com>,
Containers <containers@lists.osdl.org>,
LKML <linux-kernel@vger.kernel.org>,
Oren Laadan <orenl@cs.columbia.edu>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Ingo Molnar <mingo@elte.hu>,
Christoph Hellwig <hch@infradead.org>,
Alexey Dobriyan <adobriyan@gmail.com>
Subject: Re: Ensuring c/r maintainability (WAS Re: [RFC][PATCH 00/11] track files for checkpointability)
Date: Fri, 13 Mar 2009 12:53:01 -0500 [thread overview]
Message-ID: <20090313175301.GA13050@us.ibm.com> (raw)
In-Reply-To: <20090313063611.GH7561@us.ibm.com>
Quoting Matt Helsley (matthltc@us.ibm.com):
> On Thu, Mar 12, 2009 at 10:30:48AM -0500, Serge E. Hallyn wrote:
> > Quoting Cedric Le Goater (legoater@free.fr):
> > > >> And if Ingo's requirement is fulfilled, would any C/R patchset be acceptable ?
> > > >
> > > > Yup, no matter how hideous :) Ok not really.
> > > >
> > > > But the point was that it wasn't Dave not understanding Alexey's
> > > > suggestion, but Greg not understanding Ingo's. If you think Ingo's
> > > > goal isn't worthwhile or achievable, then argue that (as I am), don't
> > > > keep elaborating on something we all agree will be needed (Alexey's
> > > > suggestion or some other way of doing a true may-be-checkpointed test).
> > >
> > > I rather spend my time on enabling things rather than forbid them.
> >
> > That sure sounds productive. How could I argue with that.
> >
> > But wait, haven't several teams been doing that for years? So why is
> > c/r not in the upstream kernel? Could it be that ignoring the
> > upstream maintainers' concerns about (a) treating the feature as a
> > toy, (b) long-term maintainability, and (c) c/r becoming an impediment
> > to future features, and instead hacking away at our toy feature, is
> > *not* always the best course?
>
> I've been thinking about how we could make checkpoint/restart (c/r) more
> maintainable in the long-term. I've only come up with two ideas:
>
> I. Implement sparse-like __cr struct annotations for some compile-time checking.
>
> First we annotate structures which c/r needs to save. For example we might have:
>
> struct mm_struct {
> __cr struct vm_area_struct * mmap;
> struct rb_root mm_rb;
> struct vm_area_struct *mmap_cache;
> ...
> __cr unsigned long mmap_base;
> __cr unsigned long task_size;
> ..
> };
>
> The __cr annotations indicate fields of the mm_struct which must be
> saved during checkpoint restart. In fact, for non-pointer fields these
> annotations would be sufficient to generate c/r code.
>
> Next we would need a __cr_root annotation. These mark structures which
> the c/r code visits that determine the scope of c/r. If there is no path from a
> __cr annotation to a __cr_root annotation then we would conclude that c/r of
> this struct is broken. These path constraint checks could be done at compile
> time.
Hi Matt,
is what you're detecting here really something we're worried about?
Maybe that's something we should be doing - coming up with a list of
the things we are trying to detect or prevent. I can only think of
a few offhand:
1. checkpoint (and restart) a task which has used a resource which we
do cannot (yet, or ever) safely checkpoint/restart.
2. kernel has a new feature for which we have not considered
checkpoint/restart. Not only is it not safe to c/r a task using it,
but we haven't even implemented a check for tasks using it.
3. Some new kernel feature has an attribute which simply must be
stored away. An example would be the vdso_base in s390 as of
recent 2.6.29 rc's, which was not present in 2.6.28. So there are
two things to worry about in this one:
a. detect that this happened and handle it, so c/r continues
to work.
b. figure out a way to restart an older c/r image on a newer
kernel - or simply detect older images and call them
incompatible.
-serge
next prev parent reply other threads:[~2009-03-13 17:53 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-05 16:38 [RFC][PATCH 00/11] track files for checkpointability Dave Hansen
2009-03-05 16:38 ` [RFC][PATCH 01/11] kill '_data' in cr_hdr_fd_data name Dave Hansen
2009-03-05 16:38 ` [RFC][PATCH 02/11] breakout fdinfo sprintf() into its own function Dave Hansen
2009-03-05 16:39 ` [RFC][PATCH 03/11] Introduce generic_file_checkpoint() Dave Hansen
2009-03-05 16:39 ` [RFC][PATCH 04/11] actually use f_op in checkpoint code Dave Hansen
2009-03-05 16:39 ` [RFC][PATCH 05/11] add generic checkpoint f_op to ext fses Dave Hansen
2009-03-13 2:50 ` Oren Laadan
2009-03-05 16:39 ` [RFC][PATCH 06/11] add checkpoint_file_generic() to /proc Dave Hansen
2009-03-05 16:39 ` [RFC][PATCH 07/11] file c/r: expose functions to query fs support Dave Hansen
2009-03-05 16:39 ` [RFC][PATCH 08/11] expose file checkpointability and reasoning in /proc Dave Hansen
2009-03-05 16:39 ` [RFC][PATCH 09/11] check files for checkpointability Dave Hansen
2009-03-09 17:38 ` Matt Helsley
2009-03-12 19:14 ` Dave Hansen
2009-03-05 16:39 ` [RFC][PATCH 10/11] add checkpoint/restart compile helper Dave Hansen
2009-03-05 16:39 ` [RFC][PATCH 11/11] optimize c/r check in dup_fd() Dave Hansen
2009-03-05 17:40 ` [RFC][PATCH 00/11] track files for checkpointability Alexey Dobriyan
2009-03-05 19:16 ` Dave Hansen
2009-03-05 21:08 ` Alexey Dobriyan
2009-03-05 21:27 ` Dave Hansen
2009-03-05 22:00 ` Alexey Dobriyan
2009-03-05 22:24 ` Dave Hansen
2009-03-06 14:34 ` Serge E. Hallyn
2009-03-06 15:48 ` Dave Hansen
2009-03-06 16:23 ` Serge E. Hallyn
2009-03-06 16:46 ` Dave Hansen
2009-03-06 18:24 ` Serge E. Hallyn
2009-03-06 19:42 ` Dave Hansen
2009-03-13 3:05 ` Oren Laadan
2009-03-06 15:08 ` Greg Kurz
2009-03-06 15:35 ` Serge E. Hallyn
2009-03-06 17:36 ` Cedric Le Goater
2009-03-06 18:30 ` Serge E. Hallyn
2009-03-11 7:51 ` Cedric Le Goater
2009-03-12 15:30 ` Serge E. Hallyn
2009-03-13 6:36 ` Ensuring c/r maintainability (WAS Re: [RFC][PATCH 00/11] track files for checkpointability) Matt Helsley
2009-03-13 17:53 ` Serge E. Hallyn [this message]
2009-03-05 19:44 ` [RFC][PATCH 00/11] track files for checkpointability Dave Hansen
2009-03-05 18:13 ` Serge E. Hallyn
2009-03-05 18:16 ` Dave Hansen
2009-03-10 15:57 ` Nathan Lynch
2009-03-10 16:00 ` Nathan Lynch
2009-03-10 16:23 ` Serge E. Hallyn
2009-03-10 16:20 ` Serge E. Hallyn
2009-03-10 17:23 ` Nathan Lynch
2009-03-10 17:45 ` Serge E. Hallyn
2009-03-10 17:47 ` Dave Hansen
2009-03-10 16:22 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090313175301.GA13050@us.ibm.com \
--to=serue@us.ibm.com \
--cc=adobriyan@gmail.com \
--cc=containers@lists.osdl.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=matthltc@us.ibm.com \
--cc=mingo@elte.hu \
--cc=orenl@cs.columbia.edu \
--cc=serue@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox