From: Matt Helsley <matthltc@us.ibm.com>
To: Jamie Lokier <jamie@shareable.org>
Cc: Matt Helsley <matthltc@us.ibm.com>,
Andreas Dilger <adilger@sun.com>,
Oren Laadan <orenl@cs.columbia.edu>,
linux-fsdevel@vger.kernel.org,
containers@lists.linux-foundation.org
Subject: Re: [C/R v20][PATCH 38/96] c/r: dump open file descriptors
Date: Sun, 21 Mar 2010 20:37:24 -0700 [thread overview]
Message-ID: <20100322033724.GA20796@count0.beaverton.ibm.com> (raw)
In-Reply-To: <20100322022003.GA16462@shareable.org>
On Mon, Mar 22, 2010 at 02:20:03AM +0000, Jamie Lokier wrote:
> Matt Helsley wrote:
> > On Sun, Mar 21, 2010 at 05:27:03PM +0000, Jamie Lokier wrote:
> > > Matt Helsley wrote:
> > > > > That said, if the intent is to allow the restore to be done on
> > > > > another node with a "similar" filesystem (e.g. created by rsync/node
> > > > > image), instead of having a coherent distributed filesystem on all
> > > > > of the nodes then the filename makes sense.
> > > >
> > > > Yes, this is the intent.
> > >
> > > I would worry about programs which are using files which have been
> > > deleted, renamed, or (very common) renamed-over by another process
> > > after being opened, as there's a good chance they will successfully
> > > open the wrong file after c/r, and corrupt state from then on.
> >
> > The code in the patches does check for unlinked files and refuses
> > to checkpoint if an unlinked file is open. Yes, this limits the usefulness
> > of the code somewhat but it's a problem we can solve and c/r is still quite
> > useful without the solution.
> >
> > We've done our best to try and reach that ideal. You're welcome to have a
> > look at the code to see if you can find any ways in which we haven't.
> > Here's the code that refuses to checkpoint unsupported files. I think
> > it's pretty easy to read:
>
> From a very quick read,
>
> > if (d_unlinked(file->f_dentry)) {
> > ckpt_err(ctx, -EBADF, "%(T)%(P)Unlinked files unsupported\n",
> > file);
>
> Hmm.
>
> I wonder if d_unlinked() is always true for a file which is opened,
> unlinked or renamed over, but has a hard link to it from elsewhere so
> the on-disk file hasn't gone away.
Well, if the on-disk file hasn't gone away due to a hardlink then we
won't need to save the file in the checkpoint image -- the filesystem
content backup done during checkpoint should also get the file contents.
>
> I guess it probably is. That's kinda neat! I'd hoped there would be a
> good reason for f_dentry eventually ;-)
>
> What about files opened through /proc/self/fd/N before or after the
> original file was unlinked/renamed-over. Where does the dentry point?
Before the unlink it will result in the same file being opened. If it's
opened by a task being checkpointed then we'll be in the same situation
as the "self" task. If it's opened by a task not being checkpointed then
the "leak detection" code will notice that there's an unaccounted reference
to the file and checkpoint will fail.
That code is in checkpoint/objhash.c. It works by doing two passes:
1. Collect references
2. Checkpoint referenced objects
We only do the second pass if the ref count matches the number of times
we've "collected" the file (I added comments to the .ref_foo = ops so you
don't need to see them to get the idea):
static struct ckpt_obj_ops ckpt_obj_ops[] = {
...
/* file object */
{
.obj_name = "FILE",
.obj_type = CKPT_OBJ_FILE,
.ref_drop = obj_file_drop, /* aka fput */
.ref_grab = obj_file_grab, /* aka get_file */
.ref_users = obj_file_users, /* does atomic read of f_count */
.checkpoint = checkpoint_file,
.restore = restore_file,
},
...
};
...
/**
* ckpt_obj_contained - test if shared objects are contained in checkpoint
* @ctx: checkpoint context
*
* Loops through all objects in the table and compares the number of
* references accumulated during checkpoint, with the reference count
* reported by the kernel.
*
* Return 1 if respective counts match for all objects, 0 otherwise.
*/
int ckpt_obj_contained(struct ckpt_ctx *ctx)
{
struct ckpt_obj *obj;
struct hlist_node *node;
/* account for ctx->{file,logfile} (if in the table already) */
ckpt_obj_users_inc(ctx, ctx->file, 1);
if (ctx->logfile)
ckpt_obj_users_inc(ctx, ctx->logfile, 1);
/* account for ctx->root_nsproxy (if in the table already) */
ckpt_obj_users_inc(ctx, ctx->root_nsproxy, 1);
hlist_for_each_entry(obj, node, &ctx->obj_hash->list, next) {
if (!obj->ops->ref_users)
continue;
if (obj->ops->obj_type == CKPT_OBJ_SOCK)
obj_sock_adjust_users(obj);
if (obj->ops->ref_users(obj->ptr) != obj->users) {
ckpt_err(ctx, -EBUSY,
"%(O)%(P)%(S)Usage leak (%d != %d)\n",
obj->objref, obj->ptr, obj->ops->obj_name,
obj->ops->ref_users(obj->ptr), obj->users);
return 0;
}
}
return 1;
}
...
So that hopefully addresses your questions regarding the use of the symlinks
before the unlink.
After the unlink those symlinks are broken since they have "(deleted)"
appended. Making sure they are broken after restart is one detail I've
thought about. To make it perfect I think we could:
1. Move any existing file at the original symlinked path to a
temporary location.
2. Restore the "unlinked" file to that location.
(in quotes since it's not unlinked yet)
3. Open the "unlinked" file.
4. Unlink the file again.
5. Move the existing file back from the temporary location.
As with relinking, we need a good way to do the "temporary location".
That is complicated because we need to choose a location that we have
permission to write to, always exists during restart, and is guaranteed
not to have files in it. Relinking the file shifts these problems from
restart to checkpoint.
In case you're bored, before Oren posted these patches I wrote:
https://ckpt.wiki.kernel.org/index.php/Checklist/UnlinkedFiles
and there's lots of info related to what we do and don't support,
many related to files in one way or another, in the table at:
https://ckpt.wiki.kernel.org/index.php/Checklist
I'll update that page with some of my responses above. Getting your
thoughts on my ideas outlined above would be excellent. If you've
got some counter proposals I'd be happy to hear them too. I'll add
a reference to this thread and an edited collection of my rambling
responses to that page if you like.
Cheers,
-Matt Helsley
next prev parent reply other threads:[~2010-03-22 3:37 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-19 0:59 [C/R v20][PATCH 00/96] Linux Checkpoint-Restart - v20 Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 20/96] c/r: make file_pos_read/write() public Oren Laadan
2010-03-22 6:31 ` Nick Piggin
2010-03-23 0:12 ` Oren Laadan
2010-03-23 0:43 ` Nick Piggin
2010-03-23 0:56 ` Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 37/96] c/r: introduce new 'file_operations': ->checkpoint, ->collect() Oren Laadan
2010-03-22 6:34 ` Nick Piggin
2010-03-22 10:16 ` Matt Helsley
2010-03-22 11:00 ` Nick Piggin
2010-03-19 0:59 ` [C/R v20][PATCH 38/96] c/r: dump open file descriptors Oren Laadan
2010-03-19 23:19 ` Andreas Dilger
2010-03-20 4:43 ` Matt Helsley
2010-03-21 17:27 ` Jamie Lokier
2010-03-21 19:40 ` Serge E. Hallyn
2010-03-21 20:58 ` Daniel Lezcano
2010-03-21 21:36 ` Oren Laadan
[not found] ` <4BA6914D.8040007-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-21 23:31 ` xing lin
2010-03-22 8:40 ` Daniel Lezcano
2010-03-22 2:12 ` Matt Helsley
2010-03-22 13:51 ` Jamie Lokier
2010-03-22 23:18 ` Andreas Dilger
2010-03-22 1:06 ` Matt Helsley
2010-03-22 2:20 ` Jamie Lokier
2010-03-22 3:37 ` Matt Helsley [this message]
2010-03-22 14:13 ` Jamie Lokier
2010-03-22 2:55 ` Serge E. Hallyn
[not found] ` <1268960401-16680-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-22 10:30 ` Nick Piggin
2010-03-22 13:22 ` Matt Helsley
2010-03-22 13:38 ` Nick Piggin
2010-03-19 0:59 ` [C/R v20][PATCH 39/96] c/r: restore " Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 40/96] c/r: introduce method '->checkpoint()' in struct vm_operations_struct Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 44/96] c/r: add generic '->checkpoint' f_op to ext fses Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 45/96] c/r: add generic '->checkpoint()' f_op to simple devices Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 46/96] c/r: add checkpoint operation for opened files of generic filesystems Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 50/96] splice: export pipe/file-to-pipe/file functionality Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 51/96] c/r: support for open pipes Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 52/96] c/r: checkpoint and restore FIFOs Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 53/96] c/r: refuse to checkpoint if monitoring directories with dnotify Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 66/96] c/r: restore file->f_cred Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 82/96] c/r: checkpoint/restart epoll sets Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 83/96] c/r: checkpoint/restart eventfd Oren Laadan
2010-03-19 1:00 ` [C/R v20][PATCH 84/96] c/r: restore task fs_root and pwd (v3) Oren Laadan
2010-03-19 1:00 ` [C/R v20][PATCH 85/96] c/r: preliminary support mounts namespace Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100322033724.GA20796@count0.beaverton.ibm.com \
--to=matthltc@us.ibm.com \
--cc=adilger@sun.com \
--cc=containers@lists.linux-foundation.org \
--cc=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=orenl@cs.columbia.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).