linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matt Helsley <matthltc@us.ibm.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Oren Laadan <orenl@cs.columbia.edu>,
	linux-fsdevel@vger.kernel.org,
	containers@lists.linux-foundation.org,
	Matt Helsley <matthltc@us.ibm.com>,
	Andreas Dilger <adilger@sun.com>
Subject: Re: [C/R v20][PATCH 38/96] c/r: dump open file descriptors
Date: Mon, 22 Mar 2010 06:22:32 -0700	[thread overview]
Message-ID: <20100322132232.GD20796@count0.beaverton.ibm.com> (raw)
In-Reply-To: <20100322103035.GF17637@laptop>

On Mon, Mar 22, 2010 at 09:30:35PM +1100, Nick Piggin wrote:
> On Thu, Mar 18, 2010 at 08:59:47PM -0400, Oren Laadan wrote:
> > @@ -531,6 +533,15 @@ static int init_checkpoint_ctx(struct ckpt_ctx *ctx, pid_t pid)
> >  		return -EINVAL;  /* cleanup by ckpt_ctx_free() */
> >  	}
> >  
> > +	/* root vfs (FIX: WILL CHANGE with mnt-ns etc */
> > +	task_lock(ctx->root_task);
> > +	fs = ctx->root_task->fs;
> > +	read_lock(&fs->lock);
> > +	ctx->root_fs_path = fs->root;
> > +	path_get(&ctx->root_fs_path);
> > +	read_unlock(&fs->lock);
> > +	task_unlock(ctx->root_task);
> > +
> >  	return 0;
> >  }
> >  
> > diff --git a/checkpoint/files.c b/checkpoint/files.c
> > new file mode 100644
> > index 0000000..7a57b24
> > --- /dev/null
> > +++ b/checkpoint/files.c
> > +char *ckpt_fill_fname(struct path *path, struct path *root, char *buf, int *len)
> > +{
> > +	struct path tmp = *root;
> > +	char *fname;
> > +
> > +	BUG_ON(!buf);
> > +	spin_lock(&dcache_lock);
> > +	fname = __d_path(path, &tmp, buf, *len);
> > +	spin_unlock(&dcache_lock);
> > +	if (IS_ERR(fname))
> > +		return fname;
> > +	*len = (buf + (*len) - fname);
> > +	/*
> > +	 * FIX: if __d_path() changed these, it must have stepped out of
> > +	 * init's namespace. Since currently we require a unified namespace
> > +	 * within the container: simply fail.
> > +	 */
> > +	if (tmp.mnt != root->mnt || tmp.dentry != root->dentry)
> > +		fname = ERR_PTR(-EBADF);
> 
> Maybe something like this is better in fs/?
> 
> 
> > +static int scan_fds(struct files_struct *files, int **fdtable)
> > +{
> > +	struct fdtable *fdt;
> > +	int *fds = NULL;
> > +	int i = 0, n = 0;
> > +	int tot = CKPT_DEFAULT_FDTABLE;
> > +
> > +	/*
> > +	 * We assume that all tasks possibly sharing the file table are
> > +	 * frozen (or we are a single process and we checkpoint ourselves).
> > +	 * Therefore, we can safely proceed after krealloc() from where we
> > +	 * left off. Otherwise the file table may be modified by another
> > +	 * task after we scan it. The behavior is this case is undefined,
> > +	 * and either checkpoint or restart will likely fail.
> > +	 */
> > + retry:
> > +	fds = krealloc(fds, tot * sizeof(*fds), GFP_KERNEL);
> > +	if (!fds)
> > +		return -ENOMEM;
> > +
> > +	rcu_read_lock();
> > +	fdt = files_fdtable(files);
> > +	for (/**/; i < fdt->max_fds; i++) {
> > +		if (!fcheck_files(files, i))
> > +			continue;
> > +		if (n == tot) {
> > +			rcu_read_unlock();
> > +			tot *= 2;	/* won't overflow: kmalloc will fail */
> > +			goto retry;
> > +		}
> > +		fds[n++] = i;
> > +	}
> > +	rcu_read_unlock();
> 
> ...
> 
> > +static int checkpoint_file_desc(struct ckpt_ctx *ctx,
> > +				struct files_struct *files, int fd)
> > +{
> > +	struct ckpt_hdr_file_desc *h;
> > +	struct file *file = NULL;
> > +	struct fdtable *fdt;
> > +	int objref, ret;
> > +	int coe = 0;	/* avoid gcc warning */
> > +	pid_t pid;
> > +
> > +	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_FILE_DESC);
> > +	if (!h)
> > +		return -ENOMEM;
> > +
> > +	rcu_read_lock();
> > +	fdt = files_fdtable(files);
> > +	file = fcheck_files(files, fd);
> > +	if (file) {
> > +		coe = FD_ISSET(fd, fdt->close_on_exec);
> > +		get_file(file);
> > +	}
> > +	rcu_read_unlock();
> > +
> > +	ret = find_locks_with_owner(file, files);
> > +	/*
> > +	 * find_locks_with_owner() returns an error when there
> > +	 * are no locks found, so we *want* it to return an error
> > +	 * code.  Its success means we have to fail the checkpoint.
> > +	 */
> > +	if (!ret) {
> > +		ret = -EBADF;
> > +		ckpt_err(ctx, ret, "%(T)fd %d has file lock or lease\n", fd);
> > +		goto out;
> > +	}
> > +
> > +	/* sanity check (although this shouldn't happen) */
> > +	ret = -EBADF;
> > +	if (!file) {
> > +		ckpt_err(ctx, ret, "%(T)fd %d gone?\n", fd);
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * TODO: Implement c/r of fowner and f_sigio.  Should be
> > +	 * trivial, but for now we just refuse its checkpoint
> > +	 */
> > +	pid = f_getown(file);
> > +	if (pid) {
> > +		ret = -EBUSY;
> > +		ckpt_err(ctx, ret, "%(T)fd %d has an owner (%d)\n", fd);
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * if seen first time, this will add 'file' to the objhash, keep
> > +	 * a reference to it, dump its state while at it.
> > +	 */
> 
> All these kinds of things (including above hunks) IMO are nasty to put
> outside fs/. It would be nice to see higher level functionality
> implemented in fs and exported to your checkpoint stuff.

Agreed. I posted a series of patches that reorganized the non-filesystem
checkpoint/restart code by distributing it to more appropriate places.
If you can stomach web interfaces have a look at:

	http://thread.gmane.org/gmane.linux.kernel.containers/16617

It'll take a some effort to reorganize and retest ckpt-v20 as I did for
v19. Then I've got to do the same for the filesystem portions. I think
that would complete the reorganization.

> Apparently it's hard because checkpointing is so incestuous with
> everything, but that's why it's important to structure the code well.

You're saying it's difficult to organize because it's got to work with
quite a few disparate VFS structures? My impression is the code breaks
down pretty well along existing lines (fds, fd table, struct files...).
The main problems are resolving the effects of CONFIG_CHECKPOINT=n and
header inclusion messes.

Cheers,
	-Matt Helsley

  reply	other threads:[~2010-03-22 13:22 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-19  0:59 [C/R v20][PATCH 00/96] Linux Checkpoint-Restart - v20 Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 20/96] c/r: make file_pos_read/write() public Oren Laadan
2010-03-22  6:31   ` Nick Piggin
2010-03-23  0:12     ` Oren Laadan
2010-03-23  0:43       ` Nick Piggin
2010-03-23  0:56         ` Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 37/96] c/r: introduce new 'file_operations': ->checkpoint, ->collect() Oren Laadan
2010-03-22  6:34   ` Nick Piggin
2010-03-22 10:16     ` Matt Helsley
2010-03-22 11:00       ` Nick Piggin
2010-03-19  0:59 ` [C/R v20][PATCH 38/96] c/r: dump open file descriptors Oren Laadan
2010-03-19 23:19   ` Andreas Dilger
2010-03-20  4:43     ` Matt Helsley
2010-03-21 17:27       ` Jamie Lokier
2010-03-21 19:40         ` Serge E. Hallyn
2010-03-21 20:58           ` Daniel Lezcano
2010-03-21 21:36             ` Oren Laadan
     [not found]               ` <4BA6914D.8040007-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-21 23:31                 ` xing lin
2010-03-22  8:40               ` Daniel Lezcano
2010-03-22  2:12             ` Matt Helsley
2010-03-22 13:51               ` Jamie Lokier
2010-03-22 23:18               ` Andreas Dilger
2010-03-22  1:06         ` Matt Helsley
2010-03-22  2:20           ` Jamie Lokier
2010-03-22  3:37             ` Matt Helsley
2010-03-22 14:13               ` Jamie Lokier
2010-03-22  2:55           ` Serge E. Hallyn
     [not found]   ` <1268960401-16680-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-22 10:30     ` Nick Piggin
2010-03-22 13:22       ` Matt Helsley [this message]
2010-03-22 13:38         ` Nick Piggin
2010-03-19  0:59 ` [C/R v20][PATCH 39/96] c/r: restore " Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 40/96] c/r: introduce method '->checkpoint()' in struct vm_operations_struct Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 44/96] c/r: add generic '->checkpoint' f_op to ext fses Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 45/96] c/r: add generic '->checkpoint()' f_op to simple devices Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 46/96] c/r: add checkpoint operation for opened files of generic filesystems Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 50/96] splice: export pipe/file-to-pipe/file functionality Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 51/96] c/r: support for open pipes Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 52/96] c/r: checkpoint and restore FIFOs Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 53/96] c/r: refuse to checkpoint if monitoring directories with dnotify Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 66/96] c/r: restore file->f_cred Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 82/96] c/r: checkpoint/restart epoll sets Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 83/96] c/r: checkpoint/restart eventfd Oren Laadan
2010-03-19  1:00 ` [C/R v20][PATCH 84/96] c/r: restore task fs_root and pwd (v3) Oren Laadan
2010-03-19  1:00 ` [C/R v20][PATCH 85/96] c/r: preliminary support mounts namespace Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100322132232.GD20796@count0.beaverton.ibm.com \
    --to=matthltc@us.ibm.com \
    --cc=adilger@sun.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=orenl@cs.columbia.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).