* [RFC cr-pipe-v13][PATCH 0/3] c/r of open pipes
[not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-01-27 21:23 ` Oren Laadan
2009-01-27 21:23 ` [RFC cr-pipe-v13][PATCH 1/3] A new file type (CR_FD_OBJREF) for a file descriptor already setup Oren Laadan
` (3 subsequent siblings)
4 siblings, 0 replies; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:23 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen
This patchset adds support for chekcpoint-restart of open pipes.
It applies on top of 'ckpt-v13' patch set.
A new test program (test4.c in userspace tools) provides a test
case for this new functionality.
Oren.
^ permalink raw reply [flat|nested] 17+ messages in thread* [RFC cr-pipe-v13][PATCH 1/3] A new file type (CR_FD_OBJREF) for a file descriptor already setup
[not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-01-27 21:23 ` Oren Laadan
@ 2009-01-27 21:23 ` Oren Laadan
2009-01-27 21:23 ` [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes Oren Laadan
` (2 subsequent siblings)
4 siblings, 0 replies; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:23 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen
While file pointers are shared objects, they may share an underlying
object themselves. For instance, file pointers of both ends of a pipe
that share the same pipe inode. In this case, the shared entity to
handle is the inode that is shared among two file pointers (e.g read-
and write- ends). In this sort of "nested sharing" we need only save
the underlying object once (upon first encounter) on checkpoint, and
restore it only once during restart.
To checkpoint a file descriptor of this sort, we first lookup the
inode in the hash table:
If not found, it is the first encounter of this inode. Here, Besides
the file descriptor data, we also (a) register the inode in the hash
and save the corresponding 'objref' of this inode in '->fd_objref' of
the file descriptor. We then also (b) save the inode data, as per the
inode type (this is not implemented in this patch, as it depends on
the object). The file descriptor type will indicate the type of that
object (e.g. for a pipe, when supported, CR_FD_PIPE).
If found, it is the second encounter of this inode, e.g. in the case
of a pipe, as we hit the other end of the same pipe. At this point we
need only record the reference ('objref') to the inode that we had
saved before, and the file descriptor type is changed to CR_FD_OBJREF.
The logic during restart is similar: the '->fd_objref' is looked up in
the hash table. Unlike checkpoint, during restart the object that is
placed (and sought) in the hash table is the _file_ pointer, rather
than the _inode_.
If not found, it is the first encounter of this inode. Therefore we
(a) restore the inode data. Specifically, we construct a matching
object and end up with multiple file pointers (e.g. if the object is a
pipe, we will have both read- and write- ends). One of those is used
for the file descriptor in question; the other(s) will be deposited in
the hash table, to be retrieved and used later on. We also (b) register
the newly created inode in the hash table using the given 'objref'.
If found, then we can skip the setup of the underlying object that
is represented by the inode.
The type CR_FD_OBJREF indicates, on restart, that the corresponding
file descriptor is already setup and registered in the hash under the
'->fd_objref' that it had been assigned.
The next two patches use CR_FD_OBJREF to implement support for pipes.
Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
checkpoint/ckpt_file.c | 52 ++++++++++++++++++------
checkpoint/objhash.c | 30 +++++++++++---
checkpoint/rstr_file.c | 85 +++++++++++++++++++++++++++++++---------
include/linux/checkpoint.h | 1 +
include/linux/checkpoint_hdr.h | 9 +++-
5 files changed, 135 insertions(+), 42 deletions(-)
diff --git a/checkpoint/ckpt_file.c b/checkpoint/ckpt_file.c
index e3097ac..f5263f2 100644
--- a/checkpoint/ckpt_file.c
+++ b/checkpoint/ckpt_file.c
@@ -72,13 +72,31 @@ int cr_scan_fds(struct files_struct *files, int **fdtable)
return n;
}
+/* cr_write_fd_file - for regular files, directories, symbolic links */
+static int cr_write_fd_file(struct cr_ctx *ctx, struct file *file, int parent)
+{
+ return cr_write_fname(ctx, &file->f_path, &ctx->fs_mnt);
+}
+
+/* cr_inode_to_fdtype - determine the fd type given an inode */
+static inline enum fd_type cr_inode_to_fdtype(struct inode *inode)
+{
+ switch (inode->i_mode & S_IFMT) {
+ case S_IFREG:
+ return CR_FD_FILE;
+ case S_IFDIR:
+ return CR_FD_DIR;
+ }
+ /* file type unsupported */
+ return -EBADF;
+}
+
/* cr_write_fd_data - dump the state of a given file pointer */
static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
{
struct cr_hdr h;
struct cr_hdr_fd_data *hh = cr_hbuf_get(ctx, sizeof(*hh));
- struct dentry *dent = file->f_dentry;
- struct inode *inode = dent->d_inode;
+ struct inode *inode = file->f_dentry->d_inode;
enum fd_type fd_type;
int ret;
@@ -92,27 +110,35 @@ static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
hh->f_version = file->f_version;
/* FIX: need also file->uid, file->gid, file->f_owner, etc */
- switch (inode->i_mode & S_IFMT) {
- case S_IFREG:
- fd_type = CR_FD_FILE;
- break;
- case S_IFDIR:
- fd_type = CR_FD_DIR;
- break;
- default:
+ fd_type = cr_inode_to_fdtype(inode);
+ if (fd_type < 0) {
cr_hbuf_put(ctx, sizeof(*hh));
- return -EBADF;
+ return fd_type;
}
- /* FIX: check if the file/dir/link is unlinked */
hh->fd_type = fd_type;
+ hh->fd_objref = 0;
+
+ /* FIX: check if the inode is unlinked */
ret = cr_write_obj(ctx, &h, hh);
cr_hbuf_put(ctx, sizeof(*hh));
if (ret < 0)
return ret;
- return cr_write_fname(ctx, &file->f_path, &ctx->fs_mnt);
+ switch (fd_type) {
+ case CR_FD_OBJREF:
+ /* nothing to do */
+ break;
+ case CR_FD_FILE:
+ case CR_FD_DIR:
+ ret = cr_write_fd_file(ctx, file, parent);
+ break;
+ default:
+ BUG();
+ }
+
+ return ret;
}
/**
diff --git a/checkpoint/objhash.c b/checkpoint/objhash.c
index ee31b38..01fd2dc 100644
--- a/checkpoint/objhash.c
+++ b/checkpoint/objhash.c
@@ -35,20 +35,31 @@ static void cr_obj_ref_drop(struct cr_objref *obj)
case CR_OBJ_FILE:
fput((struct file *) obj->ptr);
break;
+ case CR_OBJ_INODE:
+ iput((struct inode *) obj->ptr);
+ break;
default:
BUG();
}
}
-static void cr_obj_ref_grab(struct cr_objref *obj)
+static int cr_obj_ref_grab(struct cr_objref *obj)
{
+ int ret = 0;
+
switch (obj->type) {
case CR_OBJ_FILE:
get_file((struct file *) obj->ptr);
break;
+ case CR_OBJ_INODE:
+ if (!igrab((struct inode *) obj->ptr))
+ ret = -EBADF;
+ break;
default:
BUG();
}
+
+ return ret;
}
static void cr_objhash_clear(struct cr_objhash *objhash)
@@ -144,16 +155,22 @@ static struct cr_objref *cr_obj_new(struct cr_ctx *ctx, void *ptr, int objref,
unsigned short type, unsigned short flags)
{
struct cr_objref *obj;
- int i;
+ int i, ret;
obj = kmalloc(sizeof(*obj), GFP_KERNEL);
if (!obj)
- return NULL;
+ return ERR_PTR(-ENOMEM);
obj->ptr = ptr;
obj->type = type;
obj->flags = flags;
+ ret = cr_obj_ref_grab(obj);
+ if (ret < 0) {
+ kfree(obj);
+ return ERR_PTR(ret);
+ }
+
if (objref) {
/* use @objref to index (restart) */
obj->objref = objref;
@@ -165,7 +182,6 @@ static struct cr_objref *cr_obj_new(struct cr_ctx *ctx, void *ptr, int objref,
}
hlist_add_head(&obj->hash, &ctx->objhash->head[i]);
- cr_obj_ref_grab(obj);
return obj;
}
@@ -198,8 +214,8 @@ int cr_obj_add_ptr(struct cr_ctx *ctx, void *ptr, int *objref,
obj = cr_obj_find_by_ptr(ctx, ptr);
if (!obj) {
obj = cr_obj_new(ctx, ptr, 0, type, flags);
- if (!obj)
- return -ENOMEM;
+ if (IS_ERR(obj))
+ return PTR_ERR(obj);
else
ret = 1;
} else if (obj->type != type) /* sanity check */
@@ -229,7 +245,7 @@ int cr_obj_add_ref(struct cr_ctx *ctx, void *ptr, int objref,
struct cr_objref *obj;
obj = cr_obj_new(ctx, ptr, objref, type, flags);
- return obj ? 0 : -ENOMEM;
+ return IS_ERR(obj) ? PTR_ERR(obj) : 0;
}
/**
diff --git a/checkpoint/rstr_file.c b/checkpoint/rstr_file.c
index f44b081..e3b684e 100644
--- a/checkpoint/rstr_file.c
+++ b/checkpoint/rstr_file.c
@@ -65,6 +65,55 @@ static int cr_attach_get_file(struct file *file)
return fd;
}
+/**
+ * cr_obj_add_file - register a file pointer of a given fd in hash table
+ * @ctx: checkpoint context
+ * @fd: file descriptor
+ * @objref: objrect reference
+ *
+ * Return the file pointer (will be safely referenced in the hash table)
+ */
+static struct file *cr_obj_add_file(struct cr_ctx *ctx, int fd, int objref)
+{
+ struct file *file;
+ int ret;
+
+ file = fget(fd);
+ if (!file)
+ return ERR_PTR(-EBADF);
+ ret = cr_obj_add_ref(ctx, file, objref, CR_OBJ_FILE, 0);
+ fput(file);
+ return (ret < 0 ? ERR_PTR(ret) : file);
+}
+
+/* return a new fd associated with a the file referenced by @hh->objref */
+static int
+cr_read_fd_objref(struct cr_ctx *ctx, struct cr_hdr_fd_data *hh, int rparent)
+{
+ struct file *file;
+
+ file = cr_obj_get_by_ref(ctx, hh->fd_objref, CR_OBJ_FILE);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+ return cr_attach_get_file(file);
+}
+
+/* return a new fd associated with a new open file/directory */
+static
+int cr_read_fd_file(struct cr_ctx *ctx, struct cr_hdr_fd_data *hh, int rparent)
+{
+ struct file *file;
+ int fd;
+
+ file = cr_read_open_fname(ctx, hh->f_flags, hh->f_mode);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+ fd = cr_attach_file(file);
+ if (fd < 0)
+ filp_close(file, NULL);
+ return fd;
+}
+
#define CR_SETFL_MASK (O_APPEND|O_NONBLOCK|O_NDELAY|FASYNC|O_DIRECT|O_NOATIME)
/* cr_read_fd_data - restore the state of a given file pointer */
@@ -74,7 +123,7 @@ cr_read_fd_data(struct cr_ctx *ctx, struct files_struct *files, int rparent)
struct cr_hdr_fd_data *hh = cr_hbuf_get(ctx, sizeof(*hh));
struct file *file;
int parent, ret;
- int fd = 0; /* pacify gcc warning */
+ int fd;
parent = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_FD_DATA);
pr_debug("rparent %d parent %d flags %#x mode %#x how %d\n",
@@ -88,47 +137,45 @@ cr_read_fd_data(struct cr_ctx *ctx, struct files_struct *files, int rparent)
if (parent != rparent)
goto out;
+ if (hh->fd_objref < 0)
+ goto out;
/* FIX: more sanity checks on f_flags, f_mode etc */
switch (hh->fd_type) {
case CR_FD_FILE:
case CR_FD_DIR:
- file = cr_read_open_fname(ctx, hh->f_flags, hh->f_mode);
+ fd = cr_read_fd_file(ctx, hh, rparent);
+ break;
+ case CR_FD_OBJREF:
+ fd = cr_read_fd_objref(ctx, hh, rparent);
break;
default:
goto out;
}
- if (IS_ERR(file)) {
- ret = PTR_ERR(file);
+ if (fd < 0) {
+ ret = fd;
goto out;
}
/* FIX: need to restore uid, gid, owner etc */
- /* adding <objref,file> to the hash will keep a reference to it */
- ret = cr_obj_add_ref(ctx, file, parent, CR_OBJ_FILE, 0);
- if (ret < 0) {
- filp_close(file, NULL);
- goto out;
- }
-
- fd = cr_attach_file(file); /* no need to cleanup 'file' below */
- if (fd < 0) {
- ret = fd;
- filp_close(file, NULL);
+ /* register new <objref, file> tuple in hash table */
+ file = cr_obj_add_file(ctx, fd, rparent);
+ if (IS_ERR(file)) {
+ ret = PTR_ERR(file);
goto out;
}
- ret = sys_fcntl(fd, F_SETFL, hh->f_flags & CR_SETFL_MASK);
- if (ret < 0)
- goto out;
ret = vfs_llseek(file, hh->f_pos, SEEK_SET);
if (ret == -ESPIPE) /* ignore error on non-seekable files */
ret = 0;
- ret = 0;
+ if (ret < 0)
+ goto out;
+
+ ret = sys_fcntl(fd, F_SETFL, hh->f_flags & CR_SETFL_MASK);
out:
cr_hbuf_put(ctx, sizeof(*hh));
return ret < 0 ? ret : fd;
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 217cf6e..bf89c0f 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -75,6 +75,7 @@ extern void cr_ctx_put(struct cr_ctx *ctx);
enum {
CR_OBJ_FILE = 1,
+ CR_OBJ_INODE,
CR_OBJ_MAX
};
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 6dc739f..aba99f4 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -144,13 +144,16 @@ struct cr_hdr_fd_ent {
/* fd types */
enum fd_type {
- CR_FD_FILE = 1,
+ CR_FD_OBJREF = 1,
+ CR_FD_FILE,
CR_FD_DIR,
};
struct cr_hdr_fd_data {
- __u16 fd_type;
- __u16 f_mode;
+ __u32 fd_type;
+ __u32 fd_objref;
+
+ __u32 f_mode;
__u32 f_flags;
__u64 f_pos;
__u64 f_version;
--
1.5.4.3
^ permalink raw reply related [flat|nested] 17+ messages in thread* [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes
[not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-01-27 21:23 ` Oren Laadan
2009-01-27 21:23 ` [RFC cr-pipe-v13][PATCH 1/3] A new file type (CR_FD_OBJREF) for a file descriptor already setup Oren Laadan
@ 2009-01-27 21:23 ` Oren Laadan
[not found] ` <1233091395-24582-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-01-27 21:23 ` [RFC cr-pipe-v13][PATCH 3/3] Restore " Oren Laadan
2009-01-27 21:29 ` [RFC cr-pipe-v13][PATCH 0/3] c/r of " Oren Laadan
4 siblings, 1 reply; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:23 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen
A pipe is essentially a double-headed inode with a buffer attached to
it. We checkpoint the pipe buffer only once, as soon as we hit one
side of the pipe, regardless whether it is read- or write- end.
To checkpoint a file descriptor that refers to a pipe (either end), we
first lookup the inode in the hash table:
If not found, it is the first encounter of this pipe. Besides the file
descriptor, we also (a) save the pipe data, and (b) register the pipe
inode in the hash. We save the 'objref' of the inode 'in ->fd_objref'
of the file descriptor. The file descriptor type becomes CR_FD_PIPE.
If found, it is the second encounter of this pipe, namely, as we hit
the other end of the same pipe. In this case we need only record the
reference ('objref') to the inode that we had saved before, and the
file descriptor type is changed to CR_FD_OBJREF.
The type CR_FD_PIPE will indicate to the kernel to create a new pipe;
since both ends are created at the same time, one end will be used,
and the other end will be deposited in the hash table for later use.
The type CR_FD_OBJREF will indicate that the corresponding file
descriptor is already setup and registered in the hash using the
'->fd_objref' that it had been assigned.
The format of the pipe data is as follows:
struct cr_hdr_fd_pipe {
__u32 nr_bufs;
}
cr_hdr + cr_hdr_fd_ent
cr_hdr + cr_hdr_fd_data
cr_hdr + cr_hdr_fd_pipe -> # buffers
cr_hdr + cr_hdr_buffer -> 1st buffer
cr_hdr + cr_hdr_buffer -> 2nd buffer
cr_hdr + cr_hdr_buffer -> 3rd buffer
...
Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
checkpoint/ckpt_file.c | 108 +++++++++++++++++++++++++++++++++++++++-
include/linux/checkpoint_hdr.h | 7 +++
2 files changed, 114 insertions(+), 1 deletions(-)
diff --git a/checkpoint/ckpt_file.c b/checkpoint/ckpt_file.c
index f5263f2..9ff401a 100644
--- a/checkpoint/ckpt_file.c
+++ b/checkpoint/ckpt_file.c
@@ -12,6 +12,7 @@
#include <linux/sched.h>
#include <linux/file.h>
#include <linux/fdtable.h>
+#include <linux/pipe_fs_i.h>
#include <linux/checkpoint.h>
#include <linux/checkpoint_hdr.h>
@@ -72,6 +73,81 @@ int cr_scan_fds(struct files_struct *files, int **fdtable)
return n;
}
+/* cr_write_pipebuf - dump contents of a pipe/fifo (assume i_mutex taken) */
+static int cr_write_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe)
+{
+ struct cr_hdr h;
+ void *kbuf, *addr;
+ int i, ret = 0;
+
+ kbuf = (void *) __get_free_page(GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ /* this is a simplified fs/pipe.c:read_pipe() */
+
+ for (i = 0; i < pipe->nrbufs; i++) {
+ int nn = (pipe->curbuf + i) & (PIPE_BUFFERS-1);
+ struct pipe_buffer *pbuf = pipe->bufs + nn;
+ const struct pipe_buf_operations *ops = pbuf->ops;
+
+ ret = ops->confirm(pipe, pbuf);
+ if (ret < 0)
+ break;
+
+ addr = ops->map(pipe, pbuf, 1);
+ memcpy(kbuf, addr + pbuf->offset, pbuf->len);
+ ops->unmap(pipe, pbuf, addr);
+
+ h.type = CR_HDR_BUFFER;
+ h.len = pbuf->len;
+ h.parent = 0;
+
+ ret = cr_write_obj(ctx, &h, kbuf);
+ if (ret < 0)
+ break;
+ }
+
+ free_page((unsigned long) kbuf);
+ return ret;
+}
+
+/* cr_write_pipe - dump pipe (assume i_mutex taken) */
+static int cr_write_pipe(struct cr_ctx *ctx, struct inode *inode, int parent)
+{
+ struct cr_hdr h;
+ struct cr_hdr_fd_pipe *hh;
+ struct pipe_inode_info *pipe = inode->i_pipe;
+ int ret;
+
+ h.type = CR_HDR_FD_PIPE;
+ h.len = sizeof(*hh);
+ h.parent = parent;
+
+ hh = cr_hbuf_get(ctx, sizeof(*hh));
+ hh->nr_bufs = pipe->nrbufs;
+
+ ret = cr_write_obj(ctx, &h, hh);
+ cr_hbuf_put(ctx, sizeof(*hh));
+ if (ret < 0)
+ return ret;
+
+ return cr_write_pipebuf(ctx, pipe);
+}
+
+/* cr_write_fd_fifo - for pipe */
+static int cr_write_fd_pipe(struct cr_ctx *ctx, struct file *file, int parent)
+{
+ struct inode *inode = file->f_dentry->d_inode;
+ int ret;
+
+ mutex_lock(&inode->i_mutex);
+ ret = cr_write_pipe(ctx, inode, parent);
+ mutex_unlock(&inode->i_mutex);
+
+ return ret;
+}
+
/* cr_write_fd_file - for regular files, directories, symbolic links */
static int cr_write_fd_file(struct cr_ctx *ctx, struct file *file, int parent)
{
@@ -86,11 +162,33 @@ static inline enum fd_type cr_inode_to_fdtype(struct inode *inode)
return CR_FD_FILE;
case S_IFDIR:
return CR_FD_DIR;
+ case S_IFIFO:
+ if (inode->i_sb->s_magic == PIPEFS_MAGIC)
+ return CR_FD_PIPE; /* pipe */
}
/* file type unsupported */
return -EBADF;
}
+/* cr_inode_to_objref: */
+static int
+cr_inode_to_objref(struct cr_ctx *ctx, struct inode *inode, int type, int *new)
+{
+ int objref = 0;
+ int newobj = 1;
+
+ if (type == CR_FD_PIPE) {
+ newobj = cr_obj_add_ptr(ctx, inode, &objref, CR_OBJ_INODE, 0);
+ pr_debug("objref %d inode %p new %d\n", objref, inode, newobj);
+ }
+
+ if (newobj < 0)
+ return newobj;
+
+ *new = newobj;
+ return objref;
+}
+
/* cr_write_fd_data - dump the state of a given file pointer */
static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
{
@@ -98,6 +196,7 @@ static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
struct cr_hdr_fd_data *hh = cr_hbuf_get(ctx, sizeof(*hh));
struct inode *inode = file->f_dentry->d_inode;
enum fd_type fd_type;
+ int new = 0; /* pacify gcc */
int ret;
h.type = CR_HDR_FD_DATA;
@@ -116,8 +215,12 @@ static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
return fd_type;
}
+ hh->fd_objref = cr_inode_to_objref(ctx, inode, hh->fd_type, &new);
+ pr_debug("type %d objref %d (%d)\n", hh->fd_type, hh->fd_objref, new);
+ if (!new)
+ fd_type = CR_FD_OBJREF;
+
hh->fd_type = fd_type;
- hh->fd_objref = 0;
/* FIX: check if the inode is unlinked */
@@ -130,6 +233,9 @@ static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
case CR_FD_OBJREF:
/* nothing to do */
break;
+ case CR_FD_PIPE:
+ ret = cr_write_fd_pipe(ctx, file, parent);
+ break;
case CR_FD_FILE:
case CR_FD_DIR:
ret = cr_write_fd_file(ctx, file, parent);
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index aba99f4..0cb1963 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -58,6 +58,7 @@ enum {
CR_HDR_FILES = 301,
CR_HDR_FD_ENT,
CR_HDR_FD_DATA,
+ CR_HDR_FD_PIPE,
CR_HDR_TAIL = 5001
};
@@ -147,6 +148,7 @@ enum fd_type {
CR_FD_OBJREF = 1,
CR_FD_FILE,
CR_FD_DIR,
+ CR_FD_PIPE,
};
struct cr_hdr_fd_data {
@@ -159,4 +161,9 @@ struct cr_hdr_fd_data {
__u64 f_version;
} __attribute__((aligned(8)));
+struct cr_hdr_fd_pipe {
+ __u32 nr_bufs;
+} __attribute__((aligned(8)));
+
+
#endif /* _CHECKPOINT_CKPT_HDR_H_ */
--
1.5.4.3
^ permalink raw reply related [flat|nested] 17+ messages in thread* [RFC cr-pipe-v13][PATCH 3/3] Restore open pipes
[not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
` (2 preceding siblings ...)
2009-01-27 21:23 ` [RFC cr-pipe-v13][PATCH 2/3] Checkpoint open pipes Oren Laadan
@ 2009-01-27 21:23 ` Oren Laadan
[not found] ` <1233091395-24582-5-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-01-27 21:29 ` [RFC cr-pipe-v13][PATCH 0/3] c/r of " Oren Laadan
4 siblings, 1 reply; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:23 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen
When seeing a CR_FD_PIPE file type, we create a new pipe and thus
have two file pointers (read- and write- ends). We only use one of
them, depending on which side was checkpointed first. We register the
file pointer of the other end in the hash table, with the 'objref'
given for this pipe from the checkpoint, deposited for later use. At
this point we also restore the contents of the pipe buffers.
When the other end arrives, it will have file type CR_FD_OBJREF. We
will then use the corresponding 'objref' to retrieve the file pointer
from the hash table, and attach it to the process.
Note the difference from the checkpoint logic: during checkpoint we
placed the _inode_ of the pipe in the hash table, while during restart
we place the resulting _file_ in the hash table.
We restore the pipe contents we manually allocation and attaching
buffers to the pipe; (alternatively we could read the data from the
image file and then write it into the pipe, or use splice() syscall).
Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
checkpoint/checkpoint_file.h | 2 +
checkpoint/rstr_file.c | 122 +++++++++++++++++++++++++++++++++++++++++-
fs/pipe.c | 2 +-
3 files changed, 123 insertions(+), 3 deletions(-)
diff --git a/checkpoint/checkpoint_file.h b/checkpoint/checkpoint_file.h
index 9dc3eba..cac1a5d 100644
--- a/checkpoint/checkpoint_file.h
+++ b/checkpoint/checkpoint_file.h
@@ -14,4 +14,6 @@
int cr_scan_fds(struct files_struct *files, int **fdtable);
+extern const struct pipe_buf_operations anon_pipe_buf_ops;
+
#endif /* _CHECKPOINT_CKPT_FILE_H_ */
diff --git a/checkpoint/rstr_file.c b/checkpoint/rstr_file.c
index e3b684e..444ebed 100644
--- a/checkpoint/rstr_file.c
+++ b/checkpoint/rstr_file.c
@@ -14,6 +14,8 @@
#include <linux/file.h>
#include <linux/fdtable.h>
#include <linux/fsnotify.h>
+#include <linux/pagemap.h>
+#include <linux/pipe_fs_i.h>
#include <linux/syscalls.h>
#include <linux/checkpoint.h>
#include <linux/checkpoint_hdr.h>
@@ -86,6 +88,119 @@ static struct file *cr_obj_add_file(struct cr_ctx *ctx, int fd, int objref)
return (ret < 0 ? ERR_PTR(ret) : file);
}
+/* cr_read_pipebuf - restore contents of a pipe/fifo (assume i_mutex taken) */
+static int
+cr_read_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe, int nbufs)
+{
+ void *kbuf, *addr;
+ int i, ret = 0;
+
+ kbuf = (void *) __get_free_page(GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ for (i = 0; i < nbufs; i++) {
+ struct pipe_buffer *pbuf = pipe->bufs + i;
+ struct page *page;
+ int len = PAGE_SIZE;
+
+ ret = cr_read_buffer(ctx, kbuf, &len);
+ if (ret < 0)
+ break;
+ page = alloc_page(GFP_HIGHUSER);
+ if (!page) {
+ ret = -ENOMEM;
+ break;
+ }
+
+ addr = kmap_atomic(page, KM_USER0);
+ memcpy(addr, kbuf, len);
+ kunmap_atomic(addr, KM_USER0);
+
+ pbuf->page = page;
+ pbuf->ops = &anon_pipe_buf_ops;
+ pbuf->offset = 0;
+ pbuf->len = len;
+ pipe->nrbufs++;
+ pipe->tmp_page = NULL;
+ }
+
+ free_page((unsigned long) kbuf);
+ return ret;
+}
+
+/* cr_read_pipe - restore pipe (assume i_mutex taken) */
+static int cr_read_pipe(struct cr_ctx *ctx, int pipefd)
+{
+ struct cr_hdr_fd_pipe *hh;
+ struct file *file;
+ struct inode *inode;
+ int nbufs, ret;
+
+ hh = cr_hbuf_get(ctx, sizeof(*hh));
+ ret = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_FD_PIPE);
+ nbufs = hh->nr_bufs;
+ cr_hbuf_put(ctx, sizeof(*hh));
+
+ if (ret < 0)
+ return ret;
+ if (nbufs > PIPE_BUFFERS)
+ return -EINVAL;
+
+ file = fget(pipefd);
+ if (!file)
+ return -EIO;
+
+ inode = file->f_dentry->d_inode;
+ mutex_lock(&inode->i_mutex);
+ ret = cr_read_pipebuf(ctx, inode->i_pipe, nbufs);
+ mutex_unlock(&inode->i_mutex);
+
+ fput(file);
+ return ret;
+}
+
+/* restore a pipe */
+static int
+cr_read_fd_pipe(struct cr_ctx *ctx, struct cr_hdr_fd_data *hh, int rparent)
+{
+ struct file *file;
+ int fds[2], which, ret;
+
+ file = cr_obj_get_by_ref(ctx, hh->fd_objref, CR_OBJ_FILE);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+ else if (file)
+ return cr_attach_get_file(file);
+
+ /* first encounter of this pipe: create it */
+ ret = do_pipe(fds);
+ if (ret < 0)
+ return ret;
+
+ which = (hh->f_flags & O_WRONLY ? 1 : 0);
+
+ /*
+ * Below we return the fd corersponding to one side of the pipe
+ * for our caller to use. Now register the other side of the pipe
+ * in the hash, to be picked up when that side is to be restored.
+ */
+ file = cr_obj_add_file(ctx, fds[1-which], hh->fd_objref);
+ if (IS_ERR(file)) {
+ ret = PTR_ERR(file);
+ goto out;
+ }
+
+ ret = cr_read_pipe(ctx, fds[which]);
+ out:
+ sys_close(fds[1-which]); /* this side isn't used anymore */
+ if (ret < 0)
+ sys_close(fds[which]);
+ else
+ ret = fds[which];
+ return ret;
+}
+
/* return a new fd associated with a the file referenced by @hh->objref */
static int
cr_read_fd_objref(struct cr_ctx *ctx, struct cr_hdr_fd_data *hh, int rparent)
@@ -143,12 +258,15 @@ cr_read_fd_data(struct cr_ctx *ctx, struct files_struct *files, int rparent)
/* FIX: more sanity checks on f_flags, f_mode etc */
switch (hh->fd_type) {
+ case CR_FD_OBJREF:
+ fd = cr_read_fd_objref(ctx, hh, rparent);
+ break;
case CR_FD_FILE:
case CR_FD_DIR:
fd = cr_read_fd_file(ctx, hh, rparent);
break;
- case CR_FD_OBJREF:
- fd = cr_read_fd_objref(ctx, hh, rparent);
+ case CR_FD_PIPE:
+ fd = cr_read_fd_pipe(ctx, hh, rparent);
break;
default:
goto out;
diff --git a/fs/pipe.c b/fs/pipe.c
index 7aea8b8..8c551ac 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -268,7 +268,7 @@ int generic_pipe_buf_confirm(struct pipe_inode_info *info,
return 0;
}
-static const struct pipe_buf_operations anon_pipe_buf_ops = {
+const struct pipe_buf_operations anon_pipe_buf_ops = {
.can_merge = 1,
.map = generic_pipe_buf_map,
.unmap = generic_pipe_buf_unmap,
--
1.5.4.3
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [RFC cr-pipe-v13][PATCH 0/3] c/r of open pipes
[not found] ` <1233091395-24582-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
` (3 preceding siblings ...)
2009-01-27 21:23 ` [RFC cr-pipe-v13][PATCH 3/3] Restore " Oren Laadan
@ 2009-01-27 21:29 ` Oren Laadan
4 siblings, 0 replies; 17+ messages in thread
From: Oren Laadan @ 2009-01-27 21:29 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen
The patches can pulled from the git tree (branch 'ckpt-v13-dev')
git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git
Oren Laadan wrote:
> Adds support for chekcpoint-restart of open pipes.
>
> This patchset applies on top of 'ckpt-v13' recent patchset.
>
> A new test program (test4.c in userspace tools) provides a test
> case for this new functionality.
>
> Oren.
>
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread