linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matt Helsley <matthltc@us.ibm.com>
To: Andreas Dilger <adilger@sun.com>
Cc: Oren Laadan <orenl@cs.columbia.edu>,
	linux-fsdevel@vger.kernel.org,
	containers@lists.linux-foundation.org,
	Matt Helsley <matthltc@us.ibm.com>
Subject: Re: [C/R v20][PATCH 38/96] c/r: dump open file descriptors
Date: Fri, 19 Mar 2010 21:43:10 -0700	[thread overview]
Message-ID: <20100320044310.GC2887@count0.beaverton.ibm.com> (raw)
In-Reply-To: <F18D161D-850B-4C82-83D5-1F19D573E84F@sun.com>

On Fri, Mar 19, 2010 at 05:19:22PM -0600, Andreas Dilger wrote:
> On 2010-03-18, at 18:59, Oren Laadan wrote:
> >+int checkpoint_fname(struct ckpt_ctx *ctx, struct path *path,
> >struct path *root)
> >+{
> >+	fname = ckpt_fill_fname(path, root, buf, &flen);
> >+	if (!IS_ERR(fname)) {
> >+		ret = ckpt_write_obj_type(ctx, fname, flen,
> >+					  CKPT_HDR_FILE_NAME);
> 
> What is the intended use case for the checkpoint/restore being
> developed here?  It seems like a major risk to do the checkpoint

Yes, as you anticipated below, we want to be able to migrate the
image to a similar node.

> using the filename, since this is not guaranteed to stay constant
> and the restore may give you a different state than what was running
> when the checkpoint was done.  Storing a file handle in the

We're aware of this.

Our assumption is userspace will freeze the filesystem and/or take
suitable snapshots (e.g. with btrfs) while the tasks being checkpointed
are also frozen. If userspace wants to freeze everything but the task
performing the checkpoint then that's fine too.

We decided to have userspace checkpoint the filesystem contents because
it will likely take an extraordinarily long time. We anticipate that
userspace will want to take advantage of many time-saving strategies
which would be impossible to anticipate perfectly for our kernel
syscall ABI.

Even though a wide set of time-saving strategies is available,
the goal is to keep the checkpoint image format and content
independent of the tools that perform migration.

> checkpoint, instead of (or in addition to) the filename would allow
> restoring the state correctly.
>
> Note that you would also need to store some kind of FSID as part of
> the file handle, which is a functionality that would be desirable
> for Aneesh's recent open_by_handle() patches as well, so getting
> this right once would be of use to both projects.

I haven't looked at those, sorry. It may be useful but I think
there's room for adding that in the future as you hinted above.
My guess is, depending on the environment of the restarting machine,
an FSID might not even be enough. Again -- I need to find some time
to review those patches before I can be sure :).

Userspace coordinates the management of the nodes and thus knows
best how to map things like major:minor, /dev/foo, and/or
uuids to the appropriate "things" when it comes time to restart.
The best the kernel can do is provide all of those so that userspace
can make the choices it needs to. However, most of that information is
already available via /proc in mountinfo or via other userspace tools.
So we don't save it in the image nor do we provide new interfaces to
get it.

> That said, if the intent is to allow the restore to be done on
> another node with a "similar" filesystem (e.g. created by rsync/node
> image), instead of having a coherent distributed filesystem on all
> of the nodes then the filename makes sense.

Yes, this is the intent.

> I would recommend to store both the file handle+FSID and the
> filename, preferring the former for "100% correct" restores on the
> same node, and the latter for being able to restore on a similar
> node (e.g. system files and such that are expected to be the same on
> all nodes, but do not necessarily have the same inode number).

This sounds like a good idea for the future. However I do not think
inclusion of our patches should be predicated on this since the patches
are still useful for local restart (thanks to things like mount namespaces)
and migration without file handles.

Thanks for having a look at these!

Cheers,
	-Matt Helsley

  reply	other threads:[~2010-03-20  4:43 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-19  0:59 [C/R v20][PATCH 00/96] Linux Checkpoint-Restart - v20 Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 20/96] c/r: make file_pos_read/write() public Oren Laadan
2010-03-22  6:31   ` Nick Piggin
2010-03-23  0:12     ` Oren Laadan
2010-03-23  0:43       ` Nick Piggin
2010-03-23  0:56         ` Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 37/96] c/r: introduce new 'file_operations': ->checkpoint, ->collect() Oren Laadan
2010-03-22  6:34   ` Nick Piggin
2010-03-22 10:16     ` Matt Helsley
2010-03-22 11:00       ` Nick Piggin
2010-03-19  0:59 ` [C/R v20][PATCH 38/96] c/r: dump open file descriptors Oren Laadan
2010-03-19 23:19   ` Andreas Dilger
2010-03-20  4:43     ` Matt Helsley [this message]
2010-03-21 17:27       ` Jamie Lokier
2010-03-21 19:40         ` Serge E. Hallyn
2010-03-21 20:58           ` Daniel Lezcano
2010-03-21 21:36             ` Oren Laadan
     [not found]               ` <4BA6914D.8040007-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-21 23:31                 ` xing lin
2010-03-22  8:40               ` Daniel Lezcano
2010-03-22  2:12             ` Matt Helsley
2010-03-22 13:51               ` Jamie Lokier
2010-03-22 23:18               ` Andreas Dilger
2010-03-22  1:06         ` Matt Helsley
2010-03-22  2:20           ` Jamie Lokier
2010-03-22  3:37             ` Matt Helsley
2010-03-22 14:13               ` Jamie Lokier
2010-03-22  2:55           ` Serge E. Hallyn
     [not found]   ` <1268960401-16680-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-22 10:30     ` Nick Piggin
2010-03-22 13:22       ` Matt Helsley
2010-03-22 13:38         ` Nick Piggin
2010-03-19  0:59 ` [C/R v20][PATCH 39/96] c/r: restore " Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 40/96] c/r: introduce method '->checkpoint()' in struct vm_operations_struct Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 44/96] c/r: add generic '->checkpoint' f_op to ext fses Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 45/96] c/r: add generic '->checkpoint()' f_op to simple devices Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 46/96] c/r: add checkpoint operation for opened files of generic filesystems Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 50/96] splice: export pipe/file-to-pipe/file functionality Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 51/96] c/r: support for open pipes Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 52/96] c/r: checkpoint and restore FIFOs Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 53/96] c/r: refuse to checkpoint if monitoring directories with dnotify Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 66/96] c/r: restore file->f_cred Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 82/96] c/r: checkpoint/restart epoll sets Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 83/96] c/r: checkpoint/restart eventfd Oren Laadan
2010-03-19  1:00 ` [C/R v20][PATCH 84/96] c/r: restore task fs_root and pwd (v3) Oren Laadan
2010-03-19  1:00 ` [C/R v20][PATCH 85/96] c/r: preliminary support mounts namespace Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100320044310.GC2887@count0.beaverton.ibm.com \
    --to=matthltc@us.ibm.com \
    --cc=adilger@sun.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=orenl@cs.columbia.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).