All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Ram Pai <linuxram-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>,
	Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>,
	Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>,
	Dave Hansen <haveblue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Linux Containers <containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>
Subject: checkpoint/restart of mounts
Date: Mon, 1 Mar 2010 12:44:31 -0600	[thread overview]
Message-ID: <20100301184431.GA18902@us.ibm.com> (raw)

I've been thinking about the implementation of checkpoint/restart of
mounts.  There are a few issues I wanted to solicit input on.

First, there is a question about what exactly we want to checkpoint.
From a higher level, I really like the idea of requiring that everything
except proc, tmpfs,  and devpts be a bind mount from the container's
parent mounts namespace.  That way restart can be completely independent
of devices and fs layout, and /bin/restart or lxc-restart or whatever
can just recreate the mnt/directory structure of the parent.  Then
the kernel can just slice and dice with bind mount.

But let's assume the container has /tmp2 bind-mount on /tmp.  Near as
i can tell, asking for the path of the source of that bind mount is
like asking what the real filename of an inode is - there is no single
reliable answer.  So my plan right now is to record the maj:min and
the device-relative pathname - in other words the info we have in
/proc/mountinfo.  The problem is that makes us dependent on devices.

I think we'll have to deal with that with translation of checkpoint
images.

Second, mounts changes caused by host.  Let's say the container was
created with /var/spool being a mount (mount --bind . .) and that
/var/spool is either a shared or slave mount.  Now, after the container
has been started, the host does a mount --bind /usr/spool/mail /var/spool/mail.
A few ways we could deal with that:

	1. We refuse checkpoint of a container which has any mounts
	   propagation escaping the container.  That'll turn into one
	   very ugly check, but should be do-able.  However, it is not
	   100% reliable.  In particular, after the bind mount above,
	   the container could have done mount --make-rprivate /var/spool.
	   Now checkpoint will not catch the past propagation leak,
	   and restart will be 'wrong'.

	2. A wrapper around the checkpoint program records the mounts
	   which existed when the container was started, and records
	   any changes at the time of checkpoint.
	
	3. (save your 'yuck's please :) We only allow mounts - or maybe
	   mounts propagation - checkpoint relative to either a
	   previous checkpoint, or some sort of configuration file
	   showing the initial state of mounts.

	   So perhaps if you want mounts c/r in a container, you must
	   start the container in a frozen state, do your first checkpoint
	   before the container's init starts up, and then do incremental
	   checkpoints from there.

Third, there is the issue of mounts propagation in general.  I suspect
the only sane thing to do is to require that propagation into and out
of the container is set up correctly by /bin/restart - not our problem
how that is done - and then we can re-create propagation between mounts
in all mounts namespaces which are isolated inside the container.

Finally, it isn't lost on me that we may have everything we need in
userspace through /proc/self/mountinfo.  In fact, we can even tell
mounts namespaces since /proc/$$/mountinfo will give us different
mount ids for / in different mounts namespaces.  So perhaps we can
have user-cr/restart.c do the CLONE_NEWNS and restore mounts.

Comments?

thanks,
-serge

                 reply	other threads:[~2010-03-01 18:44 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100301184431.GA18902@us.ibm.com \
    --to=serue-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
    --cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
    --cc=haveblue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    --cc=linuxram-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    --cc=matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    --cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.