All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Matt Helsley <matthltc@us.ibm.com>
Cc: containers@lists.linux-foundation.org,
	Eric Sandeen <sandeen@redhat.com>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Jamie Lokier <jamie@shareable.org>,
	Amir Goldstein <amir73il@users.sf.net>,
	Christoph Hellwig <hch@infradead.org>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 5/6] [RFC] Checkpoint/restart unlinked files
Date: Fri, 22 Oct 2010 16:43:44 -0700	[thread overview]
Message-ID: <20101022234344.GA23663@us.ibm.com> (raw)
In-Reply-To: <7da8a050193a91b69ecb3899ce2eda541ecd2473.1285278339.git.matthltc@us.ibm.com>

Matt Helsley [matthltc@us.ibm.com] wrote:
<snip>

| To understand why relinking is extremely useful for checkpoint/restart
| consider this simple pseudocode program and a specific example checkpoint
| of it:

I can see how relinking the file simplifies C/R :-) But patch 2 indicates
not all filesystems can support relink. Hope they aren't too many of those.

| 
| 	a_fd = open("a"); /* example: size of the file at "a" is 1GB */
| 	link("a", "b");
| 	unlink("a");
| 	creat("a");
| 	             <---- example: checkpoint happens here
| 	write(a_fd, "bar");
| 
| The file "a" is unlinked and a different file has been placed at that
| path. a_fd still refers to the inode shared with "b".
| 
| Without relinking we would need to walk the entire filesystem to find out
| that "b" is a path to the same inode 

You may want to mention here that to checkpoint/restart a file, we save/
restore the pathname. So finding a path for the unliked file 'a' would
require walking the entire filesystem to find any alias.

| (another variation on this case: "b"
| would also have been unlinked). We'd need to do this for every
| unlinked file that remains open in every task to checkpoint. Even then
| there is no guarantee such a "b" exists for every unlinked file -- the
| inodes could be "orphans" -- and we'd need to preserve their contents
| some other way.
| 
| I considered a couple alternatives to preserving unlinked file contents:

s/couple/couple of/

| copying and file handles. Each has significant drawbacks.
| 
| First I attempted to copy the file contents into the image and then
| recreate and unlink the file during restart. Using a simple version of
| that method the write above would not reach "b". One fix would be to search
| the filesystem for a file with the same inode number (inode of "b") and
| either open it or hardlink it to "a". Another would be to record the inode
| number. This either shifts the search from checkpoint time to restart time
| or has all the drawbacks of the second method I considered: file handles.
| 
| Instead of copying contents or recording inodes I also considered using
| file handles. We'd need to ensure that the filehandles persist in storage,
| can be snapshotted/backed up, and can be migrated. Can handlefs or any
| generic file handle system do this? My _guess_ is "no" but folks are
| welcome to tell me I'm wrong.
| 
| In contrast, linking the file from a_fd back into its filesystem can avoid
| these complexities. Relinking avoids the search for matching inodes and
| copying large quantities of data from storage only to write it back (in
| fact the data would be read-and-written twice -- once for checkpoint and
| once for restart). Like file handles it does require changes to the
| filesystem code. Unlike file handles, enabling relinking does not require
| every filesystem to support a new kind of filesystem "object" -- only
| an operation that is quite similar to one that already exists: link.
| 
| Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
| Cc: Eric Sandeen <sandeen@redhat.com>
| Cc: Theodore Ts'o <tytso@mit.edu>
| Cc: Andreas Dilger <adilger.kernel@dilger.ca>
| Cc: linux-ext4@vger.kernel.org
| Cc: Jan Kara <jack@suse.cz>
| Cc: containers@lists.linux-foundation.org
| Cc: Oren Laadan <orenl@cs.columbia.edu>
| Cc: linux-fsdevel@vger.kernel.org
| Cc: Al Viro <viro@zeniv.linux.org.uk>
| Cc: Christoph Hellwig <hch@infradead.org>
| Cc: Jamie Lokier <jamie@shareable.org>
| Cc: Amir Goldstein <amir73il@users.sf.net>
| Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
| Cc: Miklos Szeredi <miklos@szeredi.hu>
| ---
|  fs/checkpoint.c                  |   51 ++++++++++++++-----
|  fs/namei.c                       |  102 ++++++++++++++++++++++++++++++++++++++
|  fs/pipe.c                        |    2 +-
|  include/linux/checkpoint.h       |    3 +-
|  include/linux/checkpoint_hdr.h   |    3 +
|  include/linux/checkpoint_types.h |    3 +
|  6 files changed, 149 insertions(+), 15 deletions(-)
| 
| diff --git a/fs/checkpoint.c b/fs/checkpoint.c
| index 87d7c6e..9c7caec 100644
| --- a/fs/checkpoint.c
| +++ b/fs/checkpoint.c
| @@ -16,6 +16,7 @@
|  #include <linux/sched.h>
|  #include <linux/file.h>
|  #include <linux/namei.h>
| +#include <linux/mount.h>
|  #include <linux/fs_struct.h>
|  #include <linux/fs.h>
|  #include <linux/fdtable.h>
| @@ -26,6 +27,7 @@
|  #include <linux/checkpoint.h>
|  #include <linux/eventpoll.h>
|  #include <linux/eventfd.h>
| +#include <linux/sys-wrapper.h>
|  #include <net/sock.h>
| 
|  /**************************************************************************
| @@ -174,6 +176,9 @@ int checkpoint_file_common(struct ckpt_ctx *ctx, struct file *file,
|  	h->f_pos = file->f_pos;
|  	h->f_version = file->f_version;
| 
| +	if (d_unlinked(file->f_dentry))
| +		/* Perform post-checkpoint and post-restart unlink() */
| +		h->f_restart_flags |= RESTART_FILE_F_UNLINK;
|  	h->f_credref = checkpoint_obj(ctx, f_cred, CKPT_OBJ_CRED);
|  	if (h->f_credref < 0)
|  		return h->f_credref;
| @@ -197,16 +202,6 @@ int generic_file_checkpoint(struct ckpt_ctx *ctx, struct file *file)
|  	struct ckpt_hdr_file_generic *h;
|  	int ret;
| 
| -	/*
| -	 * FIXME: when we'll add support for unlinked files/dirs, we'll
| -	 * need to distinguish between unlinked filed and unlinked dirs.
| -	 */
| -	if (d_unlinked(file->f_dentry)) {
| -		ckpt_err(ctx, -EBADF, "%(T)%(P)Unlinked files unsupported\n",
| -			 file);
| -		return -EBADF;
| -	}
| -
|  	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_FILE);
|  	if (!h)
|  		return -ENOMEM;
| @@ -220,6 +215,9 @@ int generic_file_checkpoint(struct ckpt_ctx *ctx, struct file *file)
|  	if (ret < 0)
|  		goto out;
|  	ret = checkpoint_fname(ctx, &file->f_path, &ctx->root_fs_path);

Hmm, what file name will be checkpointed here, if the file is unlinked ?

| +	if (ret < 0)
| +		goto out;
| +	ret = checkpoint_file_links(ctx, file);
|   out:
|  	ckpt_hdr_put(ctx, h);
|  	return ret;
| @@ -570,9 +568,11 @@ static int ckpt_read_fname(struct ckpt_ctx *ctx, char **fname)
|  /**
|   * restore_open_fname - read a file name and open a file
|   * @ctx: checkpoint context
| + * @restore_unlinked: unlink the opened file
|   * @flags: file flags
|   */
| -struct file *restore_open_fname(struct ckpt_ctx *ctx, int flags)
| +struct file *restore_open_fname(struct ckpt_ctx *ctx,
| +				int restore_unlinked, int flags)

nit: s/restore_unlinked/unlinked/ ?

|  {
|  	struct file *file;
|  	char *fname;
| @@ -586,8 +586,33 @@ struct file *restore_open_fname(struct ckpt_ctx *ctx, int flags)
|  	if (len < 0)
|  		return ERR_PTR(len);
|  	ckpt_debug("fname '%s' flags %#x\n", fname, flags);
| -
| +	if (restore_unlinked) {
| +		kfree(fname);
| +		fname = NULL;
| +		len = ckpt_read_payload(ctx, (void **)&fname, PATH_MAX,
| +					CKPT_HDR_BUFFER);

Hmm, is there a reason we need a special way to read the file name for
unlinked files ? After re-linking the file during checkpoint, can we
not treat it like any other open file (except for the flag) ?

| +		if (len < 0)
| +			return ERR_PTR(len);
| +		fname[len] = '\0';
| +	}
|  	file = filp_open(fname, flags, 0);
| +	if (IS_ERR(file)) {
| +		ckpt_err(ctx, PTR_ERR(file), "Could not open file \"%s\"\n", fname);
| +
| +		goto out;
| +	}
| +	if (!restore_unlinked)
| +		goto out;
| +	if (S_ISDIR(file->f_mapping->host->i_mode))
| +		len = kernel_sys_rmdir(fname);
| +	else
| +		len = kernel_sys_unlink(fname);
| +	if (len < 0) {
| +		ckpt_err(ctx, len, "Could not unlink \"%s\"\n", fname);
| +		fput(file);
| +		file = ERR_PTR(len);
| +	}

nit: how about moving this unlink block to a smaller function ?

| +out:
|  	kfree(fname);
| 
|  	return file;
| @@ -692,7 +717,7 @@ static struct file *generic_file_restore(struct ckpt_ctx *ctx,
|  	    ptr->h.len != sizeof(*ptr) || ptr->f_type != CKPT_FILE_GENERIC)
|  		return ERR_PTR(-EINVAL);
| 
| -	file = restore_open_fname(ctx, ptr->f_flags);
| +	file = restore_open_fname(ctx, !!(ptr->f_restart_flags & RESTART_FILE_F_UNLINK), ptr->f_flags);

nit: long line

|  	if (IS_ERR(file))
|  		return file;
| 
| diff --git a/fs/namei.c b/fs/namei.c
| index 8c9663d..69c4f4e 100644
| --- a/fs/namei.c
| +++ b/fs/namei.c
| @@ -32,6 +32,9 @@
|  #include <linux/fcntl.h>
|  #include <linux/device_cgroup.h>
|  #include <linux/fs_struct.h>
| +#ifdef CONFIG_CHECKPOINT
| +#include <linux/checkpoint.h>
| +#endif
|  #include <asm/uaccess.h>
| 
|  #include "internal.h"
| @@ -2543,6 +2546,105 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
|  	return sys_linkat(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
|  }
| 
| +#ifdef CONFIG_CHECKPOINT
| +
| +/* Path relative to the mounted filesystem's root -- not a "global" root or even a namespace root. The unique_name_count is unique for the entire checkpoint. */
| +#define CKPT_RELINKAT_FMT "lost+found/checkpoint/unlink-at-restart-%08llx-%u"
| +
| +static int checkpoint_fill_relink_fname(struct ckpt_ctx *ctx,

nit. since it is a static function, we could probably drop the 'checkpoint_'
prefix in the name ?

| +					struct file *for_file,
| +					char relink_dir_pathname[PATH_MAX],
| +					int *lenp)
| +{
| +	struct path relink_dir_path;

nit. since the function name has "relink", maybe variable names can skip
(code is easier to read with smaller variable names).

| +	char *tmp;
| +	int len;
| +
| +	/* Find path to mount */
| +	relink_dir_path.mnt = for_file->f_path.mnt;
| +	relink_dir_path.dentry = relink_dir_path.mnt->mnt_root;
| +	tmp = d_path(&relink_dir_path, relink_dir_pathname, PATH_MAX);
| +	if (IS_ERR(tmp))
| +		return PTR_ERR(tmp);
| +
| +	/* Append path to relinked file. */
| +	len = strlen(tmp);
| +	if (len <= 0)
| +		return -ENOENT;
| +	memmove(relink_dir_pathname, tmp, len);
| +	tmp = relink_dir_pathname + len - 1;
| +	/* Ensure we've got a single dir separator */
| +	if (*tmp == '/')
| +		tmp++;
| +	else {
| +		tmp++;

we could simplify the 'if-else' by making the tmp++ unconditional (or by
removing the -1 above).

| +		*tmp = '/';
| +		tmp++;
| +		len++;
| +	}
| +	len += snprintf(tmp, PATH_MAX - len, CKPT_RELINKAT_FMT,
| +			ctx->ktime_begin.tv64,
| +			 ++ctx->unique_name_count);

Since the format is dependent on additional parameters (tv64, unique_name_count)
any changes to the format will require updates in multiple places in the future
right ? That would make the CKPT_RELINKAT_FMT macro less useful.

Instead how about a function like this that could be used during both checkpoint
and restart: 

	static inline int generate_relinked_path(ctx, buf, len)
	{
		return sprintf(...);
	}

| +	relink_dir_pathname[len] = '\0';
| +	*lenp = len;
| +	return 0;
| +}
| +
| +static int checkpoint_file_relink(struct ckpt_ctx *ctx,
| +				  struct file *file,
| +				  char new_path[PATH_MAX])
| +{
| +	int ret, len;
| +
| +	/* 
| +	 * Relinking arbitrary files without searching a path
| +	 * (which non-existent if the file is unlinked) requires

s/which/which is/
s/file is/file was/

| +	 * special privileges.
| +	 */
| +	if (!capable(CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH)) {
| +		ckpt_err(ctx, -EPERM, "%(T)Relinking unlinked files requires CAP_DAC_{OVERRIDE,READ_SEARCH}\n");

nit: long line

| +		return -EPERM;
| +	}
nit: a blank line here might help
| +	ret = checkpoint_fill_relink_fname(ctx, file, new_path, &len);
| +	if (ret)
| +		return ret;
| +	ret = do_kern_linkat(&file->f_path, file->f_dentry,
| +			     AT_FDCWD, new_path, 0);
| +	if (ret)
| +		ckpt_err(ctx, ret, "%(T)%(P)%(V)Failed to relink unlinked file.\n", file, file->f_op);

nit: long line

| +	return ret;
| +}
| +
| +int checkpoint_file_links(struct ckpt_ctx *ctx, struct file *file)
| +{
| +	char *new_link_path;
| +	int ret, len;
| +
| +	if (!d_unlinked(file->f_dentry))
| +		return 0;
| +
| +	/*
| +	 * Unlinked files need at least one hardlink for the post-sys_checkpoint
| +	 * filesystem backup/snapshot.
| +	 */
| +	new_link_path = kmalloc(PATH_MAX, GFP_KERNEL);
| +	if (!new_link_path)
| +		return -ENOMEM;
| +	ret = checkpoint_file_relink(ctx, file, new_link_path);
| +	if (ret < 0)
| +		goto out_free;
| +	len = strlen(new_link_path);
| +	ret = ckpt_write_obj_type(ctx, NULL, len + 1, CKPT_HDR_BUFFER);
| +	if (ret < 0)
| +		goto out_free;
| +	ret = ckpt_kwrite(ctx, new_link_path, len + 1);
| +out_free:
| +	kfree(new_link_path);
| +
| +	return ret;
| +}

nit: some blank lines separating the different sections of the function will
help readability

Sukadev

  parent reply	other threads:[~2010-10-22 23:36 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-23 21:53 [PATCH 0/6] Relink unlinked files for checkpoint/restart support Matt Helsley
     [not found] ` <1285278812-16972-1-git-send-email-matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-09-23 21:53   ` [PATCH 1/6] Move sys-wrappers Matt Helsley
2010-09-23 21:53     ` [PATCH 2/6] [RFC] Create the .relink file_operation Matt Helsley
2010-09-29 20:19       ` Oren Laadan
2010-09-29 23:07         ` Matt Helsley
     [not found]         ` <4CA39F53.6040506-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-09-29 23:07           ` Matt Helsley
     [not found]       ` <d91e3660266d3a2956c4d1aebc9cb618081b21ef.1285278339.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-09-26 19:08         ` Brad Boyer
2010-09-27 19:16           ` Matt Helsley
     [not found]             ` <20100927191628.GN23839-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-09-27 22:03               ` Brad Boyer
2010-09-27 22:03             ` Brad Boyer
     [not found]               ` <20100927220346.GA30726-gUHvAUm18WIVF38mYka5EQ@public.gmane.org>
2010-09-29 20:16                 ` Oren Laadan
2010-09-29 20:16               ` Oren Laadan
     [not found]           ` <20100926190837.GA9308-gUHvAUm18WIVF38mYka5EQ@public.gmane.org>
2010-09-27 19:16             ` Matt Helsley
2010-09-29 20:19         ` Oren Laadan
2010-09-23 21:53     ` [PATCH 3/6] [RFC] ext3/4: Allow relinking to unlinked files Matt Helsley
2010-09-23 21:53     ` [PATCH 4/6] [RFC] Split do_linkat() out of sys_linkat Matt Helsley
     [not found]       ` <9eade1f1ec10c23dae296feda8af9fe87085e843.1285278339.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-09-29 20:28         ` Oren Laadan
2010-09-23 21:53     ` [PATCH 5/6] [RFC] Checkpoint/restart unlinked files Matt Helsley
2010-09-29 22:22       ` Oren Laadan
2010-09-30  1:17         ` Matt Helsley
2010-09-30 15:45           ` Oren Laadan
     [not found]           ` <20100930011711.GT23839-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-09-30 15:45             ` Oren Laadan
     [not found]         ` <4CA3BC2D.8010001-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-09-30  1:17           ` Matt Helsley
2010-10-22 23:43       ` Sukadev Bhattiprolu [this message]
     [not found]       ` <7da8a050193a91b69ecb3899ce2eda541ecd2473.1285278339.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-09-29 22:22         ` Oren Laadan
2010-10-22 23:43         ` Sukadev Bhattiprolu
     [not found]     ` <6987185123220ec2034677299859c5a63eaf2aed.1285278339.git.matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-09-23 21:53       ` [PATCH 2/6] [RFC] Create the .relink file_operation Matt Helsley
2010-09-23 21:53       ` [PATCH 3/6] [RFC] ext3/4: Allow relinking to unlinked files Matt Helsley
2010-09-23 21:53       ` [PATCH 4/6] [RFC] Split do_linkat() out of sys_linkat Matt Helsley
2010-09-23 21:53       ` [PATCH 5/6] [RFC] Checkpoint/restart unlinked files Matt Helsley
2010-09-23 21:53       ` [PATCH 6/6] [RFC] Enable c/r of unlinked fifos Matt Helsley
2010-11-01 16:38   ` [PATCH 0/6] Relink unlinked files for checkpoint/restart support Oren Laadan
     [not found]     ` <4CCEECFF.9010909-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-11-01 20:43       ` Matt Helsley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101022234344.GA23663@us.ibm.com \
    --to=sukadev@linux.vnet.ibm.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=amir73il@users.sf.net \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jamie@shareable.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=matthltc@us.ibm.com \
    --cc=miklos@szeredi.hu \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.