Linux Container Development
 help / color / mirror / Atom feed
From: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>,
	"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>,
	"Serge E. Hallyn"
	<serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>,
	Nathan Lynch <nathanl-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>c
Subject: Re: C/R: File substitution at restart
Date: Thu, 9 Sep 2010 04:02:20 -0700	[thread overview]
Message-ID: <20100909110220.GF8957@count0.beaverton.ibm.com> (raw)
In-Reply-To: <20100909103720.GF4812-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>

On Thu, Sep 09, 2010 at 12:37:20PM +0200, Louis Rilling wrote:
> On 08/09/10 21:06 -0700, Matt Helsley wrote:
> > On Wed, Sep 08, 2010 at 08:03:52PM -0500, Serge E. Hallyn wrote:
> > > Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> > > > On Wed, Sep 08, 2010 at 08:09:31AM -0500, Serge E. Hallyn wrote:
> > > > I think it can be split into two composable pieces which may also be
> > > > useful independently.
> > > > 
> > > > The first uses the fcntl() interface to add a flag like
> > > > O_CLOEXEC. Unlike O_CLOEXEC it marks an fd for preservation during
> > > > restart. That way we don't have to specify an fd number and a "source"
> > > > to the kernel. Just tell the kernel to keep the fd. The source can
> > > > be opened and dup2'd via userspace. This is useful without the
> > > > second piece if we want to simply add rather than replace an fd.
> > > 
> > > Can you think of any other use for this flag other than restart?
> > 
> > <joking>
> > I can't think of any other uses for O_CLOEXEC.
> > </joking>
> > 
> > Seriously though, restart will be used _much_ less often than exec so yes
> > it does seem like a waste of a valuable bit and something that wouldn't
> > quite belong in an fcntl interface.
> > 
> > However we can try to be a tad clever -- we could (ab|re)use O_CLOEXEC.
> > Right now restart closes all file descriptors and pays absolutely
> > no attention to O_CLOEXEC. We could reuse O_CLOEXEC to mean O_CLOREST
> > too. Have user-cr's restart tool mark all unwanted fds O_CLOEXEC. Any we
> > want to keep we do not mark with O_CLOEXEC.
> 
> This would also be useful at checkpoint, to tell sys_checkpoint() which fds
> should be ignored, being because it is not supported or because the application
> has a better way to deal with it.

True. Though unlike restart I don't think we just can (ab|re)use O_CLOEXEC
for that purpose.

> 
> > 
> > 
> > Here's another idea which I haven't fully thought out yet.
> > 
> > We could introduce the concept of object id substitutions in the image.
> > So the image would look like (going from file pos 0 at the top..):
> > 
> > 0 +-------------------------------+
> >   |                               |
> >                 .....
> >   +-------------------------------+
> >   |     <substitute object>       | <--- object with id == <substitute id>
> >                 .....
> >   +---------------+---------------+
> >   |  <object id>  |<substitute id>|
> >   +---------------+---------------+
> >                 .....
> >   +---------------+---------------+
> >   |     <object to ignore>        | <-- object with id == <object id>
> >                 .....
> > 
> > (The above is ignoring the ckpt_hdr fields..)
> > 
> > When we read the image during restart we use the substitute ids to
> > create indirect objhash entries. When we encounter an obj id and
> > it refers to an indirect entry we first parse the object (ignoring
> > errors and dropping references on new objhash insertions), flip
> > a bit on the indirect entry (indicating the object has been parsed),
> > and then lookup the substitute id and return whatever that resolved to.
> > 
> > We can ignore the new objhash objects by making the objhash have its
> > own operation struct. When we're parsing an object that's been
> > substituted we just temporarily set the objhash add/lookup operations
> > to something suitable for properly dropping references to the new
> > object(s). This way we don't have to add checks for this peculiar
> > need all over the checkpoint/restart code. Sure it'll be slower...
> 
> If at checkpoint we can take care to ignore files that we know will be
> substituted, this should not be that slower.

So, would you say typically it's the application developer who knows
what to ignore? Are we expecting distros/packagers to be able to set
that up? Admins? These specific optimizations seem like they would be a
bit fragile unless the application developer is involved.

Cheers,
	-Matt Helsley

  parent reply	other threads:[~2010-09-09 11:02 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-08 10:03 C/R: File substitution at restart Matthieu Fertré
     [not found] ` <4C875F6E.2030004-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org>
2010-09-08 13:09   ` Serge E. Hallyn
     [not found]     ` <20100908130931.GA11161-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-08 17:56       ` Sukadev Bhattiprolu
     [not found]         ` <20100908175648.GA12281-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-09-08 22:49           ` Serge E. Hallyn
2010-09-08 19:35       ` Matt Helsley
     [not found]         ` <20100908193531.GB8957-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-09-09  1:03           ` Serge E. Hallyn
     [not found]             ` <20100909010352.GA13880-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-09  4:06               ` Matt Helsley
     [not found]                 ` <20100909040635.GE8957-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-09-09 10:37                   ` Louis Rilling
     [not found]                     ` <20100909103720.GF4812-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2010-09-09 11:02                       ` Matt Helsley [this message]
     [not found]                         ` <20100909110220.GF8957-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-09-09 11:34                           ` Louis Rilling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100909110220.GF8957@count0.beaverton.ibm.com \
    --to=matthltc-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
    --cc=nathanl-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org \
    --cc=serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org \
    --cc=serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox