From: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Dave Hansen
<dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: Re: [RFC v14-rc2][PATCH 17/29] Checkpoint open pipes
Date: Wed, 1 Apr 2009 14:47:26 -0500 [thread overview]
Message-ID: <20090401194726.GA24066@us.ibm.com> (raw)
In-Reply-To: <1238477349-11029-18-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> A pipe is essentially a double-headed inode with a buffer attached to
> it. We checkpoint the pipe buffer only once, as soon as we hit one
> side of the pipe, regardless whether it is read- or write- end.
>
> To checkpoint a file descriptor that refers to a pipe (either end), we
> first lookup the inode in the hash table:
>
> If not found, it is the first encounter of this pipe. Besides the file
> descriptor, we also (a) save the pipe data, and (b) register the pipe
> inode in the hash. We save the 'objref' of the inode 'in ->fd_objref'
> of the file descriptor. The file descriptor type becomes CR_FD_PIPE.
>
> If found, it is the second encounter of this pipe, namely, as we hit
> the other end of the same pipe. In this case we need only record the
> reference ('objref') to the inode that we had saved before, and the
> file descriptor type is changed to CR_FD_OBJREF.
>
> The type CR_FD_PIPE will indicate to the kernel to create a new pipe;
> since both ends are created at the same time, one end will be used,
> and the other end will be deposited in the hash table for later use.
> The type CR_FD_OBJREF will indicate that the corresponding file
> descriptor is already setup and registered in the hash using the
> '->fd_objref' that it had been assigned.
>
> The format of the pipe data is as follows:
>
> struct cr_hdr_fd_pipe {
> __u32 nr_bufs;
> }
>
> cr_hdr + cr_hdr_fd_ent
> cr_hdr + cr_hdr_fd_data
> cr_hdr + cr_hdr_fd_pipe -> # buffers
> cr_hdr + cr_hdr_buffer -> 1st buffer
> cr_hdr + cr_hdr_buffer -> 2nd buffer
> cr_hdr + cr_hdr_buffer -> 3rd buffer
> ...
>
> Changelog[v14]:
> - Use 'fd_type' instead of 'hh->fd_objref' in cr_write_fd_data()
> - Revert change to pr_debug(), back to cr_debug()
> - Discard the 'h.parent' field
> - Check whether calls to cr_hbuf_get() fail
> - Test that a pipe's inode != ctx->file's inode to prevent deadlock
>
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
But:
> ---
> checkpoint/ckpt_file.c | 2 +
> fs/pipe.c | 113 ++++++++++++++++++++++++++++++++++++++++
> include/linux/checkpoint_hdr.h | 8 +++-
> 3 files changed, 122 insertions(+), 1 deletions(-)
>
> diff --git a/checkpoint/ckpt_file.c b/checkpoint/ckpt_file.c
> index 0fe68bf..dd26b3d 100644
> --- a/checkpoint/ckpt_file.c
> +++ b/checkpoint/ckpt_file.c
> @@ -12,6 +12,7 @@
> #include <linux/sched.h>
> #include <linux/file.h>
> #include <linux/fdtable.h>
> +#include <linux/pipe_fs_i.h>
> #include <linux/checkpoint.h>
> #include <linux/checkpoint_hdr.h>
>
> @@ -72,6 +73,7 @@ int cr_scan_fds(struct files_struct *files, int **fdtable)
> return n;
> }
>
> +
> static int cr_write_file_generic(struct cr_ctx *ctx, struct file *file,
> struct cr_hdr_file *hh)
> {
> diff --git a/fs/pipe.c b/fs/pipe.c
> index 14f502b..0c3f391 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -22,6 +22,9 @@
> #include <asm/uaccess.h>
> #include <asm/ioctls.h>
>
> +#include <linux/checkpoint.h>
> +#include <linux/checkpoint_hdr.h>
> +
> /*
> * We use a start+len construction, which provides full use of the
> * allocated memory.
> @@ -771,6 +774,113 @@ pipe_rdwr_open(struct inode *inode, struct file *filp)
> return 0;
> }
>
> +/* cr_write_pipebuf - dump contents of a pipe/fifo (assume i_mutex taken) */
> +static int cr_write_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe)
> +{
> + struct cr_hdr h;
> + void *kbuf, *addr;
> + int i, ret = 0;
> +
> + kbuf = (void *) __get_free_page(GFP_KERNEL);
> + if (!kbuf)
> + return -ENOMEM;
> +
> + /* this is a simplified fs/pipe.c:read_pipe() */
pipe_read() actually :)
> +
> + for (i = 0; i < pipe->nrbufs; i++) {
> + int nn = (pipe->curbuf + i) & (PIPE_BUFFERS-1);
> + struct pipe_buffer *pbuf = pipe->bufs + nn;
> + const struct pipe_buf_operations *ops = pbuf->ops;
> +
> + ret = ops->confirm(pipe, pbuf);
> + if (ret < 0)
> + break;
not that it seems to matter, but pipe_read() returns error
also if ret > 0.
> +
> + addr = ops->map(pipe, pbuf, 1);
> + memcpy(kbuf, addr + pbuf->offset, pbuf->len);
> + ops->unmap(pipe, pbuf, addr);
> +
> + h.type = CR_HDR_BUFFER;
> + h.len = pbuf->len;
> +
> + ret = cr_write_obj(ctx, &h, kbuf);
> + if (ret < 0)
> + break;
> + }
> +
> + free_page((unsigned long) kbuf);
> + return ret;
> +}
> +
> +/* cr_write_pipe - dump pipe (assume i_mutex taken) */
> +static int cr_write_pipe(struct cr_ctx *ctx, struct inode *inode)
> +{
> + struct cr_hdr h;
> + struct cr_hdr_fd_pipe *hh;
> + struct pipe_inode_info *pipe = inode->i_pipe;
> + int ret;
> +
> + h.type = CR_HDR_FD_PIPE;
> + h.len = sizeof(*hh);
> +
> + hh = cr_hbuf_get(ctx, sizeof(*hh));
> + if (!hh)
> + return -ENOMEM;
> +
> + hh->nr_bufs = pipe->nrbufs;
> +
> + ret = cr_write_obj(ctx, &h, hh);
> + cr_hbuf_put(ctx, sizeof(*hh));
> + if (ret < 0)
> + return ret;
> +
> + return cr_write_pipebuf(ctx, pipe);
> +}
> +
> +static int pipe_file_checkpoint(struct cr_ctx *ctx,
> + struct file *file, struct cr_hdr_file *hh)
> +{
> + struct cr_hdr h;
> + struct inode *inode = file->f_dentry->d_inode;
> + int new, objref;
> + int ret;
> +
> + /*
> + * We take the inode's mutex and later will call vfs_write(),
> + * which also takes an inode's mutex. To avoid deadlock, make
> + * sure that the two inodes are distinct.
> + */
> + if (ctx->file->f_dentry->d_inode == inode) {
> + pr_warning("c/r: writing to pipe that is checkpointed "
> + "may result in a deadlock ... aborting\n");
> + return -EDEADLK;
> + }
> +
> + h.type = CR_HDR_FILE;
> + h.len = sizeof(*hh);
> +
> + new = cr_obj_add_ptr(ctx, inode, &objref, CR_OBJ_INODE, 0);
> + cr_debug("objref %d inode %p new %d\n", objref, inode, new);
> + if (new < 0)
> + return new;
> +
> + hh->fd_type = (new ? CR_FD_PIPE : CR_FD_OBJREF);
The git commit msg has a good explanation, but it's worth a comment
in the code as well, that on first instance we call it
CR_FD_PIPE and second time CR_FD_OBJREF.
> + hh->fd_objref = objref;
> +
> + ret = cr_write_obj(ctx, &h, hh);
> + if (ret < 0)
> + return ret;
> +
> + if (new) {
> + mutex_lock(&inode->i_mutex);
> + ret = cr_write_pipe(ctx, inode);
> + mutex_unlock(&inode->i_mutex);
> + }
> +
> + return ret;
> +}
> +
> +
> /*
> * The file_operations structs are not static because they
> * are also used in linux/fs/fifo.c to do operations on FIFOs.
> @@ -787,6 +897,7 @@ const struct file_operations read_pipefifo_fops = {
> .open = pipe_read_open,
> .release = pipe_read_release,
> .fasync = pipe_read_fasync,
> + .checkpoint = pipe_file_checkpoint,
> };
>
> const struct file_operations write_pipefifo_fops = {
> @@ -799,6 +910,7 @@ const struct file_operations write_pipefifo_fops = {
> .open = pipe_write_open,
> .release = pipe_write_release,
> .fasync = pipe_write_fasync,
> + .checkpoint = pipe_file_checkpoint,
> };
>
> const struct file_operations rdwr_pipefifo_fops = {
> @@ -812,6 +924,7 @@ const struct file_operations rdwr_pipefifo_fops = {
> .open = pipe_rdwr_open,
> .release = pipe_rdwr_release,
> .fasync = pipe_rdwr_fasync,
> + .checkpoint = pipe_file_checkpoint,
> };
>
> struct pipe_inode_info * alloc_pipe_info(struct inode *inode)
> diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
> index 9ad845d..ce5d880 100644
> --- a/include/linux/checkpoint_hdr.h
> +++ b/include/linux/checkpoint_hdr.h
> @@ -57,6 +57,7 @@ enum {
> CR_HDR_FD_TABLE = 301,
> CR_HDR_FD_ENT,
> CR_HDR_FILE,
> + CR_HDR_FD_PIPE,
>
> CR_HDR_TAIL = 5001
> };
> @@ -152,7 +153,8 @@ struct cr_hdr_fd_ent {
> /* fd types */
> enum fd_type {
> CR_FD_OBJREF = 1,
> - CR_FD_GENERIC
> + CR_FD_GENERIC,
> + CR_FD_PIPE,
> };
>
> struct cr_hdr_file {
> @@ -165,4 +167,8 @@ struct cr_hdr_file {
> __u64 f_version;
> } __attribute__((aligned(8)));
>
> +struct cr_hdr_fd_pipe {
> + __s32 nr_bufs;
> +} __attribute__((aligned(8)));
> +
> #endif /* _CHECKPOINT_CKPT_HDR_H_ */
> --
> 1.5.4.3
next prev parent reply other threads:[~2009-04-01 19:47 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-31 5:28 [RFC v14-rc2][PATCH 00/29] Kernel based checkpoint/restart Oren Laadan
[not found] ` <1238477349-11029-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 01/29] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 02/29] Checkpoint/restart: initial documentation Oren Laadan
[not found] ` <1238477349-11029-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:22 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 03/29] Make file_pos_read/write() public Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 04/29] General infrastructure for checkpoint restart Oren Laadan
[not found] ` <1238477349-11029-5-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:24 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 05/29] x86 support for checkpoint/restart Oren Laadan
[not found] ` <1238477349-11029-6-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:25 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 06/29] Dump memory address space Oren Laadan
[not found] ` <1238477349-11029-7-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:26 ` Sukadev Bhattiprolu
[not found] ` <20090407032636.GD12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 4:57 ` Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 07/29] Restore " Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 08/29] Infrastructure for shared objects Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 09/29] Dump open file descriptors Oren Laadan
[not found] ` <1238477349-11029-10-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:28 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 10/29] actually use f_op in checkpoint code Oren Laadan
[not found] ` <1238477349-11029-11-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-31 18:31 ` Oren Laadan
2009-04-01 18:54 ` Serge E. Hallyn
2009-04-07 3:29 ` Sukadev Bhattiprolu
[not found] ` <20090407032912.GF12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 5:36 ` Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 11/29] add generic checkpoint f_op to ext fses Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 12/29] Restore open file descriptors Oren Laadan
[not found] ` <1238477349-11029-13-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:29 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 13/29] External checkpoint of a task other than ourself Oren Laadan
[not found] ` <1238477349-11029-14-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:30 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 14/29] Checkpoint multiple processes Oren Laadan
[not found] ` <1238477349-11029-15-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:31 ` Sukadev Bhattiprolu
[not found] ` <20090407033111.GI12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 5:12 ` Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 15/29] Restart " Oren Laadan
[not found] ` <1238477349-11029-16-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:33 ` Sukadev Bhattiprolu
[not found] ` <20090407033315.GJ12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 5:31 ` Oren Laadan
[not found] ` <49DAE526.6010900-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 16:29 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 16/29] A new file type (CR_FD_OBJREF) for a file descriptor already setup Oren Laadan
[not found] ` <1238477349-11029-17-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 13:59 ` Serge E. Hallyn
[not found] ` <20090401135952.GA16973-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-01 14:13 ` Oren Laadan
2009-04-01 18:36 ` Serge E. Hallyn
2009-04-03 15:46 ` Dan Smith
[not found] ` <87y6uhyc3j.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-03 16:25 ` Oren Laadan
[not found] ` <49D63865.1030807-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-03 16:30 ` Dan Smith
2009-04-03 16:54 ` Dave Hansen
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 17/29] Checkpoint open pipes Oren Laadan
[not found] ` <1238477349-11029-18-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 19:47 ` Serge E. Hallyn [this message]
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 18/29] Restore " Oren Laadan
[not found] ` <1238477349-11029-19-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 20:34 ` Serge E. Hallyn
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 19/29] Record 'struct file' object instead of the file name for VMAs Oren Laadan
[not found] ` <1238477349-11029-20-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 21:45 ` Serge E. Hallyn
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 20/29] Prepare to support shared memory Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 21/29] Dump anonymous- and file-mapped- " Oren Laadan
[not found] ` <1238477349-11029-22-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:06 ` Serge E. Hallyn
[not found] ` <20090401230657.GB27725-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-01 23:18 ` Oren Laadan
[not found] ` <49D3F636.1070303-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:32 ` Serge E. Hallyn
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 22/29] Restore " Oren Laadan
[not found] ` <1238477349-11029-23-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-02 16:59 ` Serge E. Hallyn
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 23/29] s390: Expose a constant for the number of words representing the CRs Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 24/29] c/r: Add CR_COPY() macro (v4) Oren Laadan
[not found] ` <1238477349-11029-25-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:20 ` Serge E. Hallyn
[not found] ` <20090401232013.GA31361-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 19:00 ` Dan Smith
[not found] ` <87vdpmnan2.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-02 19:06 ` Serge E. Hallyn
[not found] ` <20090402190612.GA24390-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 20:22 ` Dan Smith
[not found] ` <87r60an6us.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-05 20:25 ` Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 25/29] s390: define s390-specific checkpoint-restart code (v7) Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 26/29] powerpc: provide APIs for validating and updating DABR Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 27/29] powerpc: checkpoint/restart implementation Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 28/29] powerpc: wire up checkpoint and restart syscalls Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 29/29] powerpc: enable checkpoint support in Kconfig Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090401194726.GA24066@us.ibm.com \
--to=serue-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
--cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.