All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC cr-pipe-v13][PATCH 0/3] c/r of open pipes
@ 2009-01-27 21:23 Oren Laadan
       [not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:23 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

Adds support for chekcpoint-restart of open pipes.

This patchset applies on top of 'ckpt-v13' recent patchset.

A new test program (test4.c in userspace tools) provides a test
case for this new functionality.

Oren.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC cr-pipe-v13][PATCH 0/3] c/r of open pipes
       [not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-01-27 21:23   ` Oren Laadan
  2009-01-27 21:23   ` [RFC cr-pipe-v13][PATCH 1/3] A new file type (CR_FD_OBJREF) for a file descriptor already setup Oren Laadan
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:23 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

This patchset adds support for chekcpoint-restart of open pipes. 
It applies on top of 'ckpt-v13' patch set.

A new test program (test4.c in userspace tools) provides a test
case for this new functionality.

Oren.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC cr-pipe-v13][PATCH 1/3] A new file type (CR_FD_OBJREF) for a file descriptor already setup
       [not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2009-01-27 21:23   ` Oren Laadan
@ 2009-01-27 21:23   ` Oren Laadan
  2009-01-27 21:23   ` [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes Oren Laadan
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:23 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

While file pointers are shared objects, they may share an underlying
object themselves. For instance, file pointers of both ends of a pipe
that share the same pipe inode. In this case, the shared entity to
handle is the inode that is shared among two file pointers (e.g read-
and write- ends). In this sort of "nested sharing" we need only save
the underlying object once (upon first encounter) on checkpoint, and
restore it only once during restart.

To checkpoint a file descriptor of this sort, we first lookup the
inode in the hash table:

If not found, it is the first encounter of this inode. Here, Besides
the file descriptor data, we also (a) register the inode in the hash
and save the corresponding 'objref' of this inode in '->fd_objref' of
the file descriptor. We then also (b) save the inode data, as per the
inode type (this is not implemented in this patch, as it depends on
the object). The file descriptor type will indicate the type of that
object (e.g. for a pipe, when supported, CR_FD_PIPE).

If found, it is the second encounter of this inode, e.g. in the case
of a pipe, as we hit the other end of the same pipe. At this point we
need only record the reference ('objref') to the inode that we had
saved before, and the file descriptor type is changed to CR_FD_OBJREF.

The logic during restart is similar: the '->fd_objref' is looked up in
the hash table. Unlike checkpoint, during restart the object that is
placed (and sought) in the hash table is the _file_ pointer, rather
than the _inode_.

If not found, it is the first encounter of this inode. Therefore we
(a) restore the inode data. Specifically, we construct a matching
object and end up with multiple file pointers (e.g. if the object is a
pipe, we will have both read- and write- ends). One of those is used
for the file descriptor in question; the other(s) will be deposited in
the hash table, to be retrieved and used later on. We also (b) register
the newly created inode in the hash table using the given 'objref'.

If found, then we can skip the setup of the underlying object that
is represented by the inode.

The type CR_FD_OBJREF indicates, on restart, that the corresponding
file descriptor is already setup and registered in the hash under the
'->fd_objref' that it had been assigned.

The next two patches use CR_FD_OBJREF to implement support for pipes.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/ckpt_file.c         |   52 ++++++++++++++++++------
 checkpoint/objhash.c           |   30 +++++++++++---
 checkpoint/rstr_file.c         |   85 +++++++++++++++++++++++++++++++---------
 include/linux/checkpoint.h     |    1 +
 include/linux/checkpoint_hdr.h |    9 +++-
 5 files changed, 135 insertions(+), 42 deletions(-)

diff --git a/checkpoint/ckpt_file.c b/checkpoint/ckpt_file.c
index e3097ac..f5263f2 100644
--- a/checkpoint/ckpt_file.c
+++ b/checkpoint/ckpt_file.c
@@ -72,13 +72,31 @@ int cr_scan_fds(struct files_struct *files, int **fdtable)
 	return n;
 }
 
+/* cr_write_fd_file - for regular files, directories, symbolic links */
+static int cr_write_fd_file(struct cr_ctx *ctx, struct file *file, int parent)
+{
+	return cr_write_fname(ctx, &file->f_path, &ctx->fs_mnt);
+}
+
+/* cr_inode_to_fdtype - determine the fd type given an inode */
+static inline enum fd_type cr_inode_to_fdtype(struct inode *inode)
+{
+	switch (inode->i_mode & S_IFMT) {
+	case S_IFREG:
+		return CR_FD_FILE;
+	case S_IFDIR:
+		return CR_FD_DIR;
+	}
+	/* file type unsupported */
+	return -EBADF;
+}
+
 /* cr_write_fd_data - dump the state of a given file pointer */
 static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
 {
 	struct cr_hdr h;
 	struct cr_hdr_fd_data *hh = cr_hbuf_get(ctx, sizeof(*hh));
-	struct dentry *dent = file->f_dentry;
-	struct inode *inode = dent->d_inode;
+	struct inode *inode = file->f_dentry->d_inode;
 	enum fd_type fd_type;
 	int ret;
 
@@ -92,27 +110,35 @@ static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
 	hh->f_version = file->f_version;
 	/* FIX: need also file->uid, file->gid, file->f_owner, etc */
 
-	switch (inode->i_mode & S_IFMT) {
-	case S_IFREG:
-		fd_type = CR_FD_FILE;
-		break;
-	case S_IFDIR:
-		fd_type = CR_FD_DIR;
-		break;
-	default:
+	fd_type = cr_inode_to_fdtype(inode);
+	if (fd_type < 0) {
 		cr_hbuf_put(ctx, sizeof(*hh));
-		return -EBADF;
+		return fd_type;
 	}
 
-	/* FIX: check if the file/dir/link is unlinked */
 	hh->fd_type = fd_type;
+	hh->fd_objref = 0;
+
+	/* FIX: check if the inode is unlinked */
 
 	ret = cr_write_obj(ctx, &h, hh);
 	cr_hbuf_put(ctx, sizeof(*hh));
 	if (ret < 0)
 		return ret;
 
-	return cr_write_fname(ctx, &file->f_path, &ctx->fs_mnt);
+	switch (fd_type) {
+	case CR_FD_OBJREF:
+		/* nothing to do */
+		break;
+	case CR_FD_FILE:
+	case CR_FD_DIR:
+		ret = cr_write_fd_file(ctx, file, parent);
+		break;
+	default:
+		BUG();
+	}
+
+	return ret;
 }
 
 /**
diff --git a/checkpoint/objhash.c b/checkpoint/objhash.c
index ee31b38..01fd2dc 100644
--- a/checkpoint/objhash.c
+++ b/checkpoint/objhash.c
@@ -35,20 +35,31 @@ static void cr_obj_ref_drop(struct cr_objref *obj)
 	case CR_OBJ_FILE:
 		fput((struct file *) obj->ptr);
 		break;
+	case CR_OBJ_INODE:
+		iput((struct inode *) obj->ptr);
+		break;
 	default:
 		BUG();
 	}
 }
 
-static void cr_obj_ref_grab(struct cr_objref *obj)
+static int cr_obj_ref_grab(struct cr_objref *obj)
 {
+	int ret = 0;
+
 	switch (obj->type) {
 	case CR_OBJ_FILE:
 		get_file((struct file *) obj->ptr);
 		break;
+	case CR_OBJ_INODE:
+		if (!igrab((struct inode *) obj->ptr))
+			ret = -EBADF;
+		break;
 	default:
 		BUG();
 	}
+
+	return ret;
 }
 
 static void cr_objhash_clear(struct cr_objhash *objhash)
@@ -144,16 +155,22 @@ static struct cr_objref *cr_obj_new(struct cr_ctx *ctx, void *ptr, int objref,
 				    unsigned short type, unsigned short flags)
 {
 	struct cr_objref *obj;
-	int i;
+	int i, ret;
 
 	obj = kmalloc(sizeof(*obj), GFP_KERNEL);
 	if (!obj)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	obj->ptr = ptr;
 	obj->type = type;
 	obj->flags = flags;
 
+	ret = cr_obj_ref_grab(obj);
+	if (ret < 0) {
+		kfree(obj);
+		return ERR_PTR(ret);
+	}
+
 	if (objref) {
 		/* use @objref to index (restart) */
 		obj->objref = objref;
@@ -165,7 +182,6 @@ static struct cr_objref *cr_obj_new(struct cr_ctx *ctx, void *ptr, int objref,
 	}
 
 	hlist_add_head(&obj->hash, &ctx->objhash->head[i]);
-	cr_obj_ref_grab(obj);
 	return obj;
 }
 
@@ -198,8 +214,8 @@ int cr_obj_add_ptr(struct cr_ctx *ctx, void *ptr, int *objref,
 	obj = cr_obj_find_by_ptr(ctx, ptr);
 	if (!obj) {
 		obj = cr_obj_new(ctx, ptr, 0, type, flags);
-		if (!obj)
-			return -ENOMEM;
+		if (IS_ERR(obj))
+			return PTR_ERR(obj);
 		else
 			ret = 1;
 	} else if (obj->type != type)	/* sanity check */
@@ -229,7 +245,7 @@ int cr_obj_add_ref(struct cr_ctx *ctx, void *ptr, int objref,
 	struct cr_objref *obj;
 
 	obj = cr_obj_new(ctx, ptr, objref, type, flags);
-	return obj ? 0 : -ENOMEM;
+	return IS_ERR(obj) ? PTR_ERR(obj) : 0;
 }
 
 /**
diff --git a/checkpoint/rstr_file.c b/checkpoint/rstr_file.c
index f44b081..e3b684e 100644
--- a/checkpoint/rstr_file.c
+++ b/checkpoint/rstr_file.c
@@ -65,6 +65,55 @@ static int cr_attach_get_file(struct file *file)
 	return fd;
 }
 
+/**
+ * cr_obj_add_file - register a file pointer of a given fd in hash table
+ * @ctx: checkpoint context
+ * @fd: file descriptor
+ * @objref: objrect reference
+ *
+ * Return the file pointer (will be safely referenced in the hash table)
+ */
+static struct file *cr_obj_add_file(struct cr_ctx *ctx, int fd, int objref)
+{
+	struct file *file;
+	int ret;
+
+	file = fget(fd);
+	if (!file)
+		return ERR_PTR(-EBADF);
+	ret = cr_obj_add_ref(ctx, file, objref, CR_OBJ_FILE, 0);
+	fput(file);
+	return (ret < 0 ? ERR_PTR(ret) : file);
+}
+
+/* return a new fd associated with a the file referenced by @hh->objref */
+static int
+cr_read_fd_objref(struct cr_ctx *ctx, struct cr_hdr_fd_data *hh, int rparent)
+{
+	struct file *file;
+
+	file = cr_obj_get_by_ref(ctx, hh->fd_objref, CR_OBJ_FILE);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+	return cr_attach_get_file(file);
+}
+
+/* return a new fd associated with a new open file/directory */
+static
+int cr_read_fd_file(struct cr_ctx *ctx, struct cr_hdr_fd_data *hh, int rparent)
+{
+	struct file *file;
+	int fd;
+
+	file = cr_read_open_fname(ctx, hh->f_flags, hh->f_mode);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+	fd = cr_attach_file(file);
+	if (fd < 0)
+		filp_close(file, NULL);
+	return fd;
+}
+
 #define CR_SETFL_MASK (O_APPEND|O_NONBLOCK|O_NDELAY|FASYNC|O_DIRECT|O_NOATIME)
 
 /* cr_read_fd_data - restore the state of a given file pointer */
@@ -74,7 +123,7 @@ cr_read_fd_data(struct cr_ctx *ctx, struct files_struct *files, int rparent)
 	struct cr_hdr_fd_data *hh = cr_hbuf_get(ctx, sizeof(*hh));
 	struct file *file;
 	int parent, ret;
-	int fd = 0;	/* pacify gcc warning */
+	int fd;
 
 	parent = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_FD_DATA);
 	pr_debug("rparent %d parent %d flags %#x mode %#x how %d\n",
@@ -88,47 +137,45 @@ cr_read_fd_data(struct cr_ctx *ctx, struct files_struct *files, int rparent)
 
 	if (parent != rparent)
 		goto out;
+	if (hh->fd_objref < 0)
+		goto out;
 
 	/* FIX: more sanity checks on f_flags, f_mode etc */
 
 	switch (hh->fd_type) {
 	case CR_FD_FILE:
 	case CR_FD_DIR:
-		file = cr_read_open_fname(ctx, hh->f_flags, hh->f_mode);
+		fd = cr_read_fd_file(ctx, hh, rparent);
+		break;
+	case CR_FD_OBJREF:
+		fd = cr_read_fd_objref(ctx, hh, rparent);
 		break;
 	default:
 		goto out;
 	}
 
-	if (IS_ERR(file)) {
-		ret = PTR_ERR(file);
+	if (fd < 0) {
+		ret = fd;
 		goto out;
 	}
 
 	/* FIX: need to restore uid, gid, owner etc */
 
-	/* adding <objref,file> to the hash will keep a reference to it */
-	ret = cr_obj_add_ref(ctx, file, parent, CR_OBJ_FILE, 0);
-	if (ret < 0) {
-		filp_close(file, NULL);
-		goto out;
-	}
-
-	fd = cr_attach_file(file);	/* no need to cleanup 'file' below */
-	if (fd < 0) {
-		ret = fd;
-		filp_close(file, NULL);
+	/* register new <objref, file> tuple in hash table */
+	file = cr_obj_add_file(ctx, fd, rparent);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
 		goto out;
 	}
 
-	ret = sys_fcntl(fd, F_SETFL, hh->f_flags & CR_SETFL_MASK);
-	if (ret < 0)
-		goto out;
 	ret = vfs_llseek(file, hh->f_pos, SEEK_SET);
 	if (ret == -ESPIPE)	/* ignore error on non-seekable files */
 		ret = 0;
 
-	ret = 0;
+	if (ret < 0)
+		goto out;
+
+	ret = sys_fcntl(fd, F_SETFL, hh->f_flags & CR_SETFL_MASK);
  out:
 	cr_hbuf_put(ctx, sizeof(*hh));
 	return ret < 0 ? ret : fd;
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 217cf6e..bf89c0f 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -75,6 +75,7 @@ extern void cr_ctx_put(struct cr_ctx *ctx);
 
 enum {
 	CR_OBJ_FILE = 1,
+	CR_OBJ_INODE,
 	CR_OBJ_MAX
 };
 
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 6dc739f..aba99f4 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -144,13 +144,16 @@ struct cr_hdr_fd_ent {
 
 /* fd types */
 enum  fd_type {
-	CR_FD_FILE = 1,
+	CR_FD_OBJREF = 1,
+	CR_FD_FILE,
 	CR_FD_DIR,
 };
 
 struct cr_hdr_fd_data {
-	__u16 fd_type;
-	__u16 f_mode;
+	__u32 fd_type;
+	__u32 fd_objref;
+
+	__u32 f_mode;
 	__u32 f_flags;
 	__u64 f_pos;
 	__u64 f_version;
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes
       [not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2009-01-27 21:23   ` Oren Laadan
  2009-01-27 21:23   ` [RFC cr-pipe-v13][PATCH 1/3] A new file type (CR_FD_OBJREF) for a file descriptor already setup Oren Laadan
@ 2009-01-27 21:23   ` Oren Laadan
       [not found]     ` <1233091395-24582-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2009-01-27 21:23   ` [RFC cr-pipe-v13][PATCH 3/3] Restore " Oren Laadan
  2009-01-27 21:29   ` [RFC cr-pipe-v13][PATCH 0/3] c/r of " Oren Laadan
  4 siblings, 1 reply; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:23 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

A pipe is essentially a double-headed inode with a buffer attached to
it. We checkpoint the pipe buffer only once, as soon as we hit one
side of the pipe, regardless whether it is read- or write- end.

To checkpoint a file descriptor that refers to a pipe (either end), we
first lookup the inode in the hash table:

If not found, it is the first encounter of this pipe. Besides the file
descriptor, we also (a) save the pipe data, and (b) register the pipe
inode in the hash. We save the 'objref' of the inode 'in ->fd_objref'
of the file descriptor. The file descriptor type becomes CR_FD_PIPE.

If found, it is the second encounter of this pipe, namely, as we hit
the other end of the same pipe. In this case we need only record the
reference ('objref') to the inode that we had saved before, and the
file descriptor type is changed to CR_FD_OBJREF.

The type CR_FD_PIPE will indicate to the kernel to create a new pipe;
since both ends are created at the same time, one end will be used,
and the other end will be deposited in the hash table for later use.
The type CR_FD_OBJREF will indicate that the corresponding file
descriptor is already setup and registered in the hash using the
'->fd_objref' that it had been assigned.

The format of the pipe data is as follows:

struct cr_hdr_fd_pipe {
       __u32 nr_bufs;
}

cr_hdr + cr_hdr_fd_ent
	cr_hdr + cr_hdr_fd_data
		cr_hdr + cr_hdr_fd_pipe		-> # buffers
			cr_hdr + cr_hdr_buffer	-> 1st buffer
			cr_hdr + cr_hdr_buffer	-> 2nd buffer
			cr_hdr + cr_hdr_buffer	-> 3rd buffer
			...


Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/ckpt_file.c         |  108 +++++++++++++++++++++++++++++++++++++++-
 include/linux/checkpoint_hdr.h |    7 +++
 2 files changed, 114 insertions(+), 1 deletions(-)

diff --git a/checkpoint/ckpt_file.c b/checkpoint/ckpt_file.c
index f5263f2..9ff401a 100644
--- a/checkpoint/ckpt_file.c
+++ b/checkpoint/ckpt_file.c
@@ -12,6 +12,7 @@
 #include <linux/sched.h>
 #include <linux/file.h>
 #include <linux/fdtable.h>
+#include <linux/pipe_fs_i.h>
 #include <linux/checkpoint.h>
 #include <linux/checkpoint_hdr.h>
 
@@ -72,6 +73,81 @@ int cr_scan_fds(struct files_struct *files, int **fdtable)
 	return n;
 }
 
+/* cr_write_pipebuf - dump contents of a pipe/fifo (assume i_mutex taken) */
+static int cr_write_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe)
+{
+	struct cr_hdr h;
+	void *kbuf, *addr;
+	int i, ret = 0;
+
+	kbuf = (void *) __get_free_page(GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	/* this is a simplified fs/pipe.c:read_pipe() */
+
+	for (i = 0; i < pipe->nrbufs; i++) {
+		int nn = (pipe->curbuf + i) & (PIPE_BUFFERS-1);
+		struct pipe_buffer *pbuf = pipe->bufs + nn;
+		const struct pipe_buf_operations *ops = pbuf->ops;
+
+		ret = ops->confirm(pipe, pbuf);
+		if (ret < 0)
+			break;
+
+		addr = ops->map(pipe, pbuf, 1);
+		memcpy(kbuf, addr + pbuf->offset, pbuf->len);
+		ops->unmap(pipe, pbuf, addr);
+
+		h.type = CR_HDR_BUFFER;
+		h.len = pbuf->len;
+		h.parent = 0;
+
+		ret = cr_write_obj(ctx, &h, kbuf);
+		if (ret < 0)
+			break;
+	}
+
+	free_page((unsigned long) kbuf);
+	return ret;
+}
+
+/* cr_write_pipe - dump pipe (assume i_mutex taken) */
+static int cr_write_pipe(struct cr_ctx *ctx, struct inode *inode, int parent)
+{
+	struct cr_hdr h;
+	struct cr_hdr_fd_pipe *hh;
+	struct pipe_inode_info *pipe = inode->i_pipe;
+	int ret;
+
+	h.type = CR_HDR_FD_PIPE;
+	h.len = sizeof(*hh);
+	h.parent = parent;
+
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	hh->nr_bufs = pipe->nrbufs;
+
+	ret = cr_write_obj(ctx, &h, hh);
+	cr_hbuf_put(ctx, sizeof(*hh));
+	if (ret < 0)
+		return ret;
+
+	return cr_write_pipebuf(ctx, pipe);
+}
+
+/* cr_write_fd_fifo - for pipe */
+static int cr_write_fd_pipe(struct cr_ctx *ctx, struct file *file, int parent)
+{
+	struct inode *inode = file->f_dentry->d_inode;
+	int ret;
+
+	mutex_lock(&inode->i_mutex);
+	ret = cr_write_pipe(ctx, inode, parent);
+	mutex_unlock(&inode->i_mutex);
+
+	return ret;
+}
+
 /* cr_write_fd_file - for regular files, directories, symbolic links */
 static int cr_write_fd_file(struct cr_ctx *ctx, struct file *file, int parent)
 {
@@ -86,11 +162,33 @@ static inline enum fd_type cr_inode_to_fdtype(struct inode *inode)
 		return CR_FD_FILE;
 	case S_IFDIR:
 		return CR_FD_DIR;
+	case S_IFIFO:
+		if (inode->i_sb->s_magic == PIPEFS_MAGIC)
+			return CR_FD_PIPE;	/* pipe */
 	}
 	/* file type unsupported */
 	return -EBADF;
 }
 
+/* cr_inode_to_objref: */
+static int
+cr_inode_to_objref(struct cr_ctx *ctx, struct inode *inode, int type, int *new)
+{
+	int objref = 0;
+	int newobj = 1;
+
+	if (type == CR_FD_PIPE) {
+		newobj = cr_obj_add_ptr(ctx, inode, &objref, CR_OBJ_INODE, 0);
+		pr_debug("objref %d inode %p new %d\n", objref, inode, newobj);
+	}
+
+	if (newobj < 0)
+		return newobj;
+
+	*new = newobj;
+	return objref;
+}
+
 /* cr_write_fd_data - dump the state of a given file pointer */
 static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
 {
@@ -98,6 +196,7 @@ static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
 	struct cr_hdr_fd_data *hh = cr_hbuf_get(ctx, sizeof(*hh));
 	struct inode *inode = file->f_dentry->d_inode;
 	enum fd_type fd_type;
+	int new = 0;  /* pacify gcc */
 	int ret;
 
 	h.type = CR_HDR_FD_DATA;
@@ -116,8 +215,12 @@ static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
 		return fd_type;
 	}
 
+	hh->fd_objref = cr_inode_to_objref(ctx, inode, hh->fd_type, &new);
+	pr_debug("type %d objref %d (%d)\n", hh->fd_type, hh->fd_objref, new);
+	if (!new)
+		fd_type = CR_FD_OBJREF;
+
 	hh->fd_type = fd_type;
-	hh->fd_objref = 0;
 
 	/* FIX: check if the inode is unlinked */
 
@@ -130,6 +233,9 @@ static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
 	case CR_FD_OBJREF:
 		/* nothing to do */
 		break;
+	case CR_FD_PIPE:
+		ret = cr_write_fd_pipe(ctx, file, parent);
+		break;
 	case CR_FD_FILE:
 	case CR_FD_DIR:
 		ret = cr_write_fd_file(ctx, file, parent);
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index aba99f4..0cb1963 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -58,6 +58,7 @@ enum {
 	CR_HDR_FILES = 301,
 	CR_HDR_FD_ENT,
 	CR_HDR_FD_DATA,
+	CR_HDR_FD_PIPE,
 
 	CR_HDR_TAIL = 5001
 };
@@ -147,6 +148,7 @@ enum  fd_type {
 	CR_FD_OBJREF = 1,
 	CR_FD_FILE,
 	CR_FD_DIR,
+	CR_FD_PIPE,
 };
 
 struct cr_hdr_fd_data {
@@ -159,4 +161,9 @@ struct cr_hdr_fd_data {
 	__u64 f_version;
 } __attribute__((aligned(8)));
 
+struct cr_hdr_fd_pipe {
+	__u32 nr_bufs;
+} __attribute__((aligned(8)));
+
+
 #endif /* _CHECKPOINT_CKPT_HDR_H_ */
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC cr-pipe-v13][PATCH 3/3] Restore open pipes
       [not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2009-01-27 21:23   ` [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes Oren Laadan
@ 2009-01-27 21:23   ` Oren Laadan
       [not found]     ` <1233091395-24582-5-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2009-01-27 21:29   ` [RFC cr-pipe-v13][PATCH 0/3] c/r of " Oren Laadan
  4 siblings, 1 reply; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:23 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

When seeing a CR_FD_PIPE file type, we create a new pipe and thus
have two file pointers (read- and write- ends). We only use one of
them, depending on which side was checkpointed first. We register the
file pointer of the other end in the hash table, with the 'objref'
given for this pipe from the checkpoint, deposited for later use. At
this point we also restore the contents of the pipe buffers.

When the other end arrives, it will have file type CR_FD_OBJREF. We
will then use the corresponding 'objref' to retrieve the file pointer
from the hash table, and attach it to the process.

Note the difference from the checkpoint logic: during checkpoint we
placed the _inode_ of the pipe in the hash table, while during restart
we place the resulting _file_ in the hash table.

We restore the pipe contents we manually allocation and attaching
buffers to the pipe; (alternatively we could read the data from the
image file and then write it into the pipe, or use splice() syscall).


Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/checkpoint_file.h |    2 +
 checkpoint/rstr_file.c       |  122 +++++++++++++++++++++++++++++++++++++++++-
 fs/pipe.c                    |    2 +-
 3 files changed, 123 insertions(+), 3 deletions(-)

diff --git a/checkpoint/checkpoint_file.h b/checkpoint/checkpoint_file.h
index 9dc3eba..cac1a5d 100644
--- a/checkpoint/checkpoint_file.h
+++ b/checkpoint/checkpoint_file.h
@@ -14,4 +14,6 @@
 
 int cr_scan_fds(struct files_struct *files, int **fdtable);
 
+extern const struct pipe_buf_operations anon_pipe_buf_ops;
+
 #endif /* _CHECKPOINT_CKPT_FILE_H_ */
diff --git a/checkpoint/rstr_file.c b/checkpoint/rstr_file.c
index e3b684e..444ebed 100644
--- a/checkpoint/rstr_file.c
+++ b/checkpoint/rstr_file.c
@@ -14,6 +14,8 @@
 #include <linux/file.h>
 #include <linux/fdtable.h>
 #include <linux/fsnotify.h>
+#include <linux/pagemap.h>
+#include <linux/pipe_fs_i.h>
 #include <linux/syscalls.h>
 #include <linux/checkpoint.h>
 #include <linux/checkpoint_hdr.h>
@@ -86,6 +88,119 @@ static struct file *cr_obj_add_file(struct cr_ctx *ctx, int fd, int objref)
 	return (ret < 0 ? ERR_PTR(ret) : file);
 }
 
+/* cr_read_pipebuf - restore contents of a pipe/fifo (assume i_mutex taken) */
+static int
+cr_read_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe, int nbufs)
+{
+	void *kbuf, *addr;
+	int i, ret = 0;
+
+	kbuf = (void *) __get_free_page(GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	for (i = 0; i < nbufs; i++) {
+		struct pipe_buffer *pbuf = pipe->bufs + i;
+		struct page *page;
+		int len = PAGE_SIZE;
+
+		ret = cr_read_buffer(ctx, kbuf, &len);
+		if (ret < 0)
+			break;
+		page = alloc_page(GFP_HIGHUSER);
+		if (!page) {
+			ret = -ENOMEM;
+			break;
+		}
+
+		addr = kmap_atomic(page, KM_USER0);
+		memcpy(addr, kbuf, len);
+		kunmap_atomic(addr, KM_USER0);
+
+		pbuf->page = page;
+		pbuf->ops = &anon_pipe_buf_ops;
+		pbuf->offset = 0;
+		pbuf->len = len;
+		pipe->nrbufs++;
+		pipe->tmp_page = NULL;
+	}
+
+	free_page((unsigned long) kbuf);
+	return ret;
+}
+
+/* cr_read_pipe - restore pipe (assume i_mutex taken) */
+static int cr_read_pipe(struct cr_ctx *ctx, int pipefd)
+{
+	struct cr_hdr_fd_pipe *hh;
+	struct file *file;
+	struct inode *inode;
+	int nbufs, ret;
+
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	ret = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_FD_PIPE);
+	nbufs = hh->nr_bufs;
+	cr_hbuf_put(ctx, sizeof(*hh));
+
+	if (ret < 0)
+		return ret;
+	if (nbufs > PIPE_BUFFERS)
+		return -EINVAL;
+
+	file = fget(pipefd);
+	if (!file)
+		return -EIO;
+
+	inode = file->f_dentry->d_inode;
+	mutex_lock(&inode->i_mutex);
+	ret = cr_read_pipebuf(ctx, inode->i_pipe, nbufs);
+	mutex_unlock(&inode->i_mutex);
+
+	fput(file);
+	return ret;
+}
+
+/* restore a pipe */
+static int
+cr_read_fd_pipe(struct cr_ctx *ctx, struct cr_hdr_fd_data *hh, int rparent)
+{
+	struct file *file;
+	int fds[2], which, ret;
+
+	file = cr_obj_get_by_ref(ctx, hh->fd_objref, CR_OBJ_FILE);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+	else if (file)
+		return cr_attach_get_file(file);
+
+	/* first encounter of this pipe: create it */
+	ret = do_pipe(fds);
+	if (ret < 0)
+		return ret;
+
+	which = (hh->f_flags & O_WRONLY ? 1 : 0);
+
+	/*
+	 * Below we return the fd corersponding to one side of the pipe
+	 * for our caller to use. Now register the other side of the pipe
+	 * in the hash, to be picked up when that side is to be restored.
+	 */
+	file = cr_obj_add_file(ctx, fds[1-which], hh->fd_objref);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto out;
+	}
+
+	ret = cr_read_pipe(ctx, fds[which]);
+ out:
+	sys_close(fds[1-which]);	/* this side isn't used anymore */
+	if (ret < 0)
+		sys_close(fds[which]);
+	else
+		ret = fds[which];
+	return ret;
+}
+
 /* return a new fd associated with a the file referenced by @hh->objref */
 static int
 cr_read_fd_objref(struct cr_ctx *ctx, struct cr_hdr_fd_data *hh, int rparent)
@@ -143,12 +258,15 @@ cr_read_fd_data(struct cr_ctx *ctx, struct files_struct *files, int rparent)
 	/* FIX: more sanity checks on f_flags, f_mode etc */
 
 	switch (hh->fd_type) {
+	case CR_FD_OBJREF:
+		fd = cr_read_fd_objref(ctx, hh, rparent);
+		break;
 	case CR_FD_FILE:
 	case CR_FD_DIR:
 		fd = cr_read_fd_file(ctx, hh, rparent);
 		break;
-	case CR_FD_OBJREF:
-		fd = cr_read_fd_objref(ctx, hh, rparent);
+	case CR_FD_PIPE:
+		fd = cr_read_fd_pipe(ctx, hh, rparent);
 		break;
 	default:
 		goto out;
diff --git a/fs/pipe.c b/fs/pipe.c
index 7aea8b8..8c551ac 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -268,7 +268,7 @@ int generic_pipe_buf_confirm(struct pipe_inode_info *info,
 	return 0;
 }
 
-static const struct pipe_buf_operations anon_pipe_buf_ops = {
+const struct pipe_buf_operations anon_pipe_buf_ops = {
 	.can_merge = 1,
 	.map = generic_pipe_buf_map,
 	.unmap = generic_pipe_buf_unmap,
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 0/3] c/r of open pipes
       [not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2009-01-27 21:23   ` [RFC cr-pipe-v13][PATCH 3/3] Restore " Oren Laadan
@ 2009-01-27 21:29   ` Oren Laadan
  4 siblings, 0 replies; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:29 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen


The patches can pulled from the git tree (branch 'ckpt-v13-dev')
	git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git

Oren Laadan wrote:
> Adds support for chekcpoint-restart of open pipes.
> 
> This patchset applies on top of 'ckpt-v13' recent patchset.
> 
> A new test program (test4.c in userspace tools) provides a test
> case for this new functionality.
> 
> Oren.
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 3/3] Restore open pipes
       [not found]     ` <1233091395-24582-5-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-02-02 17:43       ` Dan Smith
       [not found]         ` <87k5887n16.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Dan Smith @ 2009-02-02 17:43 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

OL> +/* restore a pipe */
OL> +static int
OL> +cr_read_fd_pipe(struct cr_ctx *ctx, struct cr_hdr_fd_data *hh, int rparent)
OL> +{
OL> +	struct file *file;
OL> +	int fds[2], which, ret;
OL> +
OL> +	file = cr_obj_get_by_ref(ctx, hh->fd_objref, CR_OBJ_FILE);
OL> +	if (IS_ERR(file))
OL> +		return PTR_ERR(file);
OL> +	else if (file)
OL> +		return cr_attach_get_file(file);
OL> +
OL> +	/* first encounter of this pipe: create it */
OL> +	ret = do_pipe(fds);
OL> +	if (ret < 0)
OL> +		return ret;
OL> +
OL> +	which = (hh->f_flags & O_WRONLY ? 1 : 0);
OL> +
OL> +	/*
OL> +	 * Below we return the fd corersponding to one side of the pipe
OL> +	 * for our caller to use. Now register the other side of the pipe
OL> +	 * in the hash, to be picked up when that side is to be restored.
OL> +	 */
OL> +	file = cr_obj_add_file(ctx, fds[1-which], hh->fd_objref);
OL> +	if (IS_ERR(file)) {
OL> +		ret = PTR_ERR(file);
OL> +		goto out;
OL> +	}
OL> +
OL> +	ret = cr_read_pipe(ctx, fds[which]);
OL> + out:
OL> +	sys_close(fds[1-which]);	/* this side isn't used anymore */

This isn't always a valid thing to do, right?  I can think of two
scenarios off the top of my head that will break here:

1. The process has called pipe() but has not yet fork()'d
2. The process is using both sides of a pipe for a select()-based
   event loop

I worked up a quick test for #1, and it dies immediately on restart
with a SIGPIPE.

I'll see if I can work up and post a patch to fix this if I can come
up with something I think is reasonable.

Thanks!

-- 
Dan Smith
IBM Linux Technology Center
email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 3/3] Restore open pipes
       [not found]         ` <87k5887n16.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
@ 2009-02-03  0:02           ` Oren Laadan
       [not found]             ` <498789B1.80601-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Oren Laadan @ 2009-02-03  0:02 UTC (permalink / raw)
  To: Dan Smith
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen



Dan Smith wrote:
> OL> +/* restore a pipe */
> OL> +static int
> OL> +cr_read_fd_pipe(struct cr_ctx *ctx, struct cr_hdr_fd_data *hh, int rparent)
> OL> +{
> OL> +	struct file *file;
> OL> +	int fds[2], which, ret;
> OL> +
> OL> +	file = cr_obj_get_by_ref(ctx, hh->fd_objref, CR_OBJ_FILE);
> OL> +	if (IS_ERR(file))
> OL> +		return PTR_ERR(file);
> OL> +	else if (file)
> OL> +		return cr_attach_get_file(file);
> OL> +
> OL> +	/* first encounter of this pipe: create it */
> OL> +	ret = do_pipe(fds);

point (1)

> OL> +	if (ret < 0)
> OL> +		return ret;
> OL> +
> OL> +	which = (hh->f_flags & O_WRONLY ? 1 : 0);

point (2)

> OL> +
> OL> +	/*
> OL> +	 * Below we return the fd corersponding to one side of the pipe
> OL> +	 * for our caller to use. Now register the other side of the pipe
> OL> +	 * in the hash, to be picked up when that side is to be restored.
> OL> +	 */
> OL> +	file = cr_obj_add_file(ctx, fds[1-which], hh->fd_objref);
> OL> +	if (IS_ERR(file)) {
> OL> +		ret = PTR_ERR(file);
> OL> +		goto out;
> OL> +	}

point (3)

> OL> +
> OL> +	ret = cr_read_pipe(ctx, fds[which]);
> OL> + out:
> OL> +	sys_close(fds[1-which]);	/* this side isn't used anymore */

point (4)

> 
> This isn't always a valid thing to do, right?  I can think of two
> scenarios off the top of my head that will break here:
> 
> 1. The process has called pipe() but has not yet fork()'d
> 2. The process is using both sides of a pipe for a select()-based
>    event loop

In both scenarios, both ends of the pipes belong to a single process (and
only to it). Let's assume we restore the 'write' end first:

(1) we create the pipe (both ends)
(2) @which should be "1"
(3) we register the read-end file pointer (1-@which) in the objhash
(4) we close the file descriptor

At this point, the read end of the pipe should remain alive, because it
was put in the objhash and a reference was kept to it.

Some time later (probably when we process the next fd) we'll hit the
other end of the pipe, and we'll install it as is by taking the file
pointer from the objhash and attaching it to a new (empty) fd.

> 
> I worked up a quick test for #1, and it dies immediately on restart
> with a SIGPIPE.
> 

So the implementation of the logic above is ... inaccurate :(
I can reproduce it here; will take a closer look at it soon.

> I'll see if I can work up and post a patch to fix this if I can come
> up with something I think is reasonable.

Thanks,

Oren.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 3/3] Restore open pipes
       [not found]             ` <498789B1.80601-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-02-03 16:23               ` Dan Smith
  0 siblings, 0 replies; 17+ messages in thread
From: Dan Smith @ 2009-02-03 16:23 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

OL> So the implementation of the logic above is ... inaccurate :( I
OL> can reproduce it here; will take a closer look at it soon.

Indeed.  I think fixing the following typo makes my test case work as
expected for me:

diff --git a/checkpoint/ckpt_file.c b/checkpoint/ckpt_file.c
index 9ff401a..71aafb4 100644
--- a/checkpoint/ckpt_file.c
+++ b/checkpoint/ckpt_file.c
@@ -215,7 +215,7 @@ static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
 		return fd_type;
 	}
 
-	hh->fd_objref = cr_inode_to_objref(ctx, inode, hh->fd_type, &new);
+	hh->fd_objref = cr_inode_to_objref(ctx, inode, fd_type, &new);
 	pr_debug("type %d objref %d (%d)\n", hh->fd_type, hh->fd_objref, new);
 	if (!new)
 		fd_type = CR_FD_OBJREF;


-- 
Dan Smith
IBM Linux Technology Center
email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes
       [not found]     ` <1233091395-24582-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-02-05  9:45       ` Cedric Le Goater
       [not found]         ` <498AB553.3090902-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Cedric Le Goater @ 2009-02-05  9:45 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

>  
> +/* cr_write_pipebuf - dump contents of a pipe/fifo (assume i_mutex taken) */
> +static int cr_write_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe)
> +{
> +	struct cr_hdr h;
> +	void *kbuf, *addr;
> +	int i, ret = 0;
> +
> +	kbuf = (void *) __get_free_page(GFP_KERNEL);

this can sleep and inode->i_mutex is locked.

> +	if (!kbuf)
> +		return -ENOMEM;
> +
> +	/* this is a simplified fs/pipe.c:read_pipe() */
> +
> +	for (i = 0; i < pipe->nrbufs; i++) {
> +		int nn = (pipe->curbuf + i) & (PIPE_BUFFERS-1);
> +		struct pipe_buffer *pbuf = pipe->bufs + nn;
> +		const struct pipe_buf_operations *ops = pbuf->ops;
> +
> +		ret = ops->confirm(pipe, pbuf);
> +		if (ret < 0)
> +			break;
> +
> +		addr = ops->map(pipe, pbuf, 1);
> +		memcpy(kbuf, addr + pbuf->offset, pbuf->len);
> +		ops->unmap(pipe, pbuf, addr);
> +
> +		h.type = CR_HDR_BUFFER;
> +		h.len = pbuf->len;
> +		h.parent = 0;
> +
> +		ret = cr_write_obj(ctx, &h, kbuf);

same here.
 
> +		if (ret < 0)
> +			break;
> +	}
> +
> +	free_page((unsigned long) kbuf);
> +	return ret;
> +}

I think that cr_write_pipebuf() should be a service from fs/pipe.c. It
exposes a lot of pipe internals.

a 'dump' and 'restore' inode operations might be could solution to the
general problem of dumping and restoring inode contents. other file types
will have similar needs. 

C.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes
       [not found]         ` <498AB553.3090902-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
@ 2009-02-06  2:26           ` Nathan Lynch
       [not found]             ` <20090205202637.4d4f4a29-4v5LP+xe+1byhTdZtsIeww@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Nathan Lynch @ 2009-02-06  2:26 UTC (permalink / raw)
  To: Cedric Le Goater
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

On Thu, 05 Feb 2009 10:45:55 +0100
Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> wrote:

> >  
> > +/* cr_write_pipebuf - dump contents of a pipe/fifo (assume i_mutex taken) */
> > +static int cr_write_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe)
> > +{
> > +	struct cr_hdr h;
> > +	void *kbuf, *addr;
> > +	int i, ret = 0;
> > +
> > +	kbuf = (void *) __get_free_page(GFP_KERNEL);
> 
> this can sleep and inode->i_mutex is locked.

Generally, it is okay to perform operations that may sleep while
holding a mutex (not so with spinlocks, though).  Unless the page
allocator could try to acquire the same inode->i_mutex, this code
should be fine, no?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes
       [not found]             ` <20090205202637.4d4f4a29-4v5LP+xe+1byhTdZtsIeww@public.gmane.org>
@ 2009-02-06 13:05               ` Cedric Le Goater
  2009-02-06 17:11               ` Dave Hansen
  1 sibling, 0 replies; 17+ messages in thread
From: Cedric Le Goater @ 2009-02-06 13:05 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

Nathan Lynch wrote:
> On Thu, 05 Feb 2009 10:45:55 +0100
> Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> wrote:
> 
>>>  
>>> +/* cr_write_pipebuf - dump contents of a pipe/fifo (assume i_mutex taken) */
>>> +static int cr_write_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe)
>>> +{
>>> +	struct cr_hdr h;
>>> +	void *kbuf, *addr;
>>> +	int i, ret = 0;
>>> +
>>> +	kbuf = (void *) __get_free_page(GFP_KERNEL);
>> this can sleep and inode->i_mutex is locked.
> 
> Generally, it is okay to perform operations that may sleep while
> holding a mutex (not so with spinlocks, though).  Unless the page
> allocator could try to acquire the same inode->i_mutex, this code
> should be fine, no?

indeed. sorry for the noise.   

C.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes
       [not found]             ` <20090205202637.4d4f4a29-4v5LP+xe+1byhTdZtsIeww@public.gmane.org>
  2009-02-06 13:05               ` Cedric Le Goater
@ 2009-02-06 17:11               ` Dave Hansen
  2009-02-06 17:20                 ` Cedric Le Goater
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2009-02-06 17:11 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Cedric Le Goater

On Thu, 2009-02-05 at 20:26 -0600, Nathan Lynch wrote:
> On Thu, 05 Feb 2009 10:45:55 +0100
> Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> wrote:
> > > +/* cr_write_pipebuf - dump contents of a pipe/fifo (assume i_mutex taken) */
> > > +static int cr_write_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe)
> > > +{
> > > +	struct cr_hdr h;
> > > +	void *kbuf, *addr;
> > > +	int i, ret = 0;
> > > +
> > > +	kbuf = (void *) __get_free_page(GFP_KERNEL);
> > 
> > this can sleep and inode->i_mutex is locked.
> 
> Generally, it is okay to perform operations that may sleep while
> holding a mutex (not so with spinlocks, though).  Unless the page
> allocator could try to acquire the same inode->i_mutex, this code
> should be fine, no?

Sleeping inside mutexes is OK.  In general, they're drop-in compatible
with semaphore behavior.

-- Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes
  2009-02-06 17:11               ` Dave Hansen
@ 2009-02-06 17:20                 ` Cedric Le Goater
       [not found]                   ` <498C715B.4050303-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Cedric Le Goater @ 2009-02-06 17:20 UTC (permalink / raw)
  To: Dave Hansen
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nathan Lynch

Dave Hansen wrote:
> On Thu, 2009-02-05 at 20:26 -0600, Nathan Lynch wrote:
>> On Thu, 05 Feb 2009 10:45:55 +0100
>> Cedric Le Goater <clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> wrote:
>>>> +/* cr_write_pipebuf - dump contents of a pipe/fifo (assume i_mutex taken) */
>>>> +static int cr_write_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe)
>>>> +{
>>>> +	struct cr_hdr h;
>>>> +	void *kbuf, *addr;
>>>> +	int i, ret = 0;
>>>> +
>>>> +	kbuf = (void *) __get_free_page(GFP_KERNEL);
>>> this can sleep and inode->i_mutex is locked.
>> Generally, it is okay to perform operations that may sleep while
>> holding a mutex (not so with spinlocks, though).  Unless the page
>> allocator could try to acquire the same inode->i_mutex, this code
>> should be fine, no?
> 
> Sleeping inside mutexes is OK.  In general, they're drop-in compatible
> with semaphore behavior.

what about the vfs_write() ? 

C.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes
       [not found]                   ` <498C715B.4050303-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
@ 2009-02-06 17:24                     ` Dave Hansen
  2009-02-07 12:54                       ` Cedric Le Goater
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2009-02-06 17:24 UTC (permalink / raw)
  To: Cedric Le Goater
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nathan Lynch

On Fri, 2009-02-06 at 18:20 +0100, Cedric Le Goater wrote:
> > Sleeping inside mutexes is OK.  In general, they're drop-in compatible
> > with semaphore behavior.
> 
> what about the vfs_write() ? 

Unless vfs_write() can come back and take the same mutex, I still think
you're OK.  

-- Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes
  2009-02-06 17:24                     ` Dave Hansen
@ 2009-02-07 12:54                       ` Cedric Le Goater
       [not found]                         ` <498D8489.6090904-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Cedric Le Goater @ 2009-02-07 12:54 UTC (permalink / raw)
  To: Dave Hansen
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nathan Lynch

Dave Hansen wrote:
> On Fri, 2009-02-06 at 18:20 +0100, Cedric Le Goater wrote:
>>> Sleeping inside mutexes is OK.  In general, they're drop-in compatible
>>> with semaphore behavior.
>> what about the vfs_write() ? 
> 
> Unless vfs_write() can come back and take the same mutex, I still think
> you're OK.  

generic_file_aio_write() locks the inode mutex.

C.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes
       [not found]                         ` <498D8489.6090904-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
@ 2009-02-09 14:39                           ` Cedric Le Goater
  0 siblings, 0 replies; 17+ messages in thread
From: Cedric Le Goater @ 2009-02-09 14:39 UTC (permalink / raw)
  To: Cedric Le Goater
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nathan Lynch, Dave Hansen

Cedric Le Goater wrote:
> Dave Hansen wrote:
>> On Fri, 2009-02-06 at 18:20 +0100, Cedric Le Goater wrote:
>>>> Sleeping inside mutexes is OK.  In general, they're drop-in compatible
>>>> with semaphore behavior.
>>> what about the vfs_write() ? 
>> Unless vfs_write() can come back and take the same mutex, I still think
>> you're OK.  
> 
> generic_file_aio_write() locks the inode mutex.

For example, here's what we would get with lockdep enabled on similar
code we use to dump pipes. 

hmm, I think this is a false lockdep alarm.

C.

[   49.460000] =============================================
[   49.460000] [ INFO: possible recursive locking detected ]
[   49.460000] 2.6.27-00195-g2aa4c1c-dirty #85
[   49.460000] ---------------------------------------------
[   49.460000] mcr-checkpoint/1931 is trying to acquire lock:
[   49.460000]  (&sb->s_type->i_mutex_key#3){--..}, at: [<c014da37>] generic_file_aio_write+0x54/0xbd
[   49.460000] 
[   49.460000] but task is already holding lock:
[   49.460000]  (&sb->s_type->i_mutex_key#3){--..}, at: [<c016ed63>] pipe_dump_content+0x20/0x14e
[   49.460000] 
[   49.460000] other info that might help us debug this:
[   49.460000] 1 lock held by mcr-checkpoint/1931:
[   49.460000]  #0:  (&sb->s_type->i_mutex_key#3){--..}, at: [<c016ed63>] pipe_dump_content+0x20/0x14e
[   49.460000] 
[   49.460000] stack backtrace:
[   49.460000] Pid: 1931, comm: mcr-checkpoint Not tainted 2.6.27-00195-g2aa4c1c-dirty #85
[   49.460000]  [<c0282761>] ? printk+0xf/0x16
[   49.460000]  [<c0134ed9>] __lock_acquire+0xb52/0x11d5
[   49.460000]  [<c0186cdd>] ? generic_write_end+0x6d/0x77
[   49.460000]  [<c014c971>] ? generic_file_buffered_write+0x178/0x503
[   49.460000]  [<c01355a4>] lock_acquire+0x48/0x64
[   49.460000]  [<c014da37>] ? generic_file_aio_write+0x54/0xbd
[   49.460000]  [<c0283811>] mutex_lock_nested+0xc2/0x208
[   49.460000]  [<c014da37>] ? generic_file_aio_write+0x54/0xbd
[   49.460000]  [<c014da37>] ? generic_file_aio_write+0x54/0xbd
[   49.460000]  [<c014da37>] generic_file_aio_write+0x54/0xbd
[   49.460000]  [<c0169a04>] do_sync_write+0xab/0xe9
[   49.460000]  [<c012a345>] ? autoremove_wake_function+0x0/0x33
[   49.460000]  [<c0133acb>] ? mark_held_locks+0x53/0x6a
[   49.460000]  [<c0169959>] ? do_sync_write+0x0/0xe9
[   49.460000]  [<c016a1b3>] vfs_write+0x8a/0x104
[   49.460000]  [<c016ee5b>] pipe_dump_content+0x118/0x14e
[   49.460000]  [<c01445ad>] mcr_pipe_dump_content+0x25/0x35
[   49.460000]  [<c014472c>] sys_mcr+0x16f/0x2eb
[   49.460000]  [<c0133c42>] ? trace_hardirqs_on_caller+0xe1/0x102
[   49.460000]  [<c0102cd6>] syscall_call+0x7/0xb
[   49.460000]  =======================

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-02-09 14:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-27 21:23 [RFC cr-pipe-v13][PATCH 0/3] c/r of open pipes Oren Laadan
     [not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-01-27 21:23   ` Oren Laadan
2009-01-27 21:23   ` [RFC cr-pipe-v13][PATCH 1/3] A new file type (CR_FD_OBJREF) for a file descriptor already setup Oren Laadan
2009-01-27 21:23   ` [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes Oren Laadan
     [not found]     ` <1233091395-24582-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-02-05  9:45       ` Cedric Le Goater
     [not found]         ` <498AB553.3090902-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2009-02-06  2:26           ` Nathan Lynch
     [not found]             ` <20090205202637.4d4f4a29-4v5LP+xe+1byhTdZtsIeww@public.gmane.org>
2009-02-06 13:05               ` Cedric Le Goater
2009-02-06 17:11               ` Dave Hansen
2009-02-06 17:20                 ` Cedric Le Goater
     [not found]                   ` <498C715B.4050303-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2009-02-06 17:24                     ` Dave Hansen
2009-02-07 12:54                       ` Cedric Le Goater
     [not found]                         ` <498D8489.6090904-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2009-02-09 14:39                           ` Cedric Le Goater
2009-01-27 21:23   ` [RFC cr-pipe-v13][PATCH 3/3] Restore " Oren Laadan
     [not found]     ` <1233091395-24582-5-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-02-02 17:43       ` Dan Smith
     [not found]         ` <87k5887n16.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-02-03  0:02           ` Oren Laadan
     [not found]             ` <498789B1.80601-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-02-03 16:23               ` Dan Smith
2009-01-27 21:29   ` [RFC cr-pipe-v13][PATCH 0/3] c/r of " Oren Laadan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.