From: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Dave Hansen
<dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Subject: Re: [RFC v14-rc2][PATCH 17/29] Checkpoint open pipes
Date: Wed, 1 Apr 2009 14:47:26 -0500 [thread overview]
Message-ID: <20090401194726.GA24066@us.ibm.com> (raw)
In-Reply-To: <1238477349-11029-18-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> A pipe is essentially a double-headed inode with a buffer attached to
> it. We checkpoint the pipe buffer only once, as soon as we hit one
> side of the pipe, regardless whether it is read- or write- end.
>
> To checkpoint a file descriptor that refers to a pipe (either end), we
> first lookup the inode in the hash table:
>
> If not found, it is the first encounter of this pipe. Besides the file
> descriptor, we also (a) save the pipe data, and (b) register the pipe
> inode in the hash. We save the 'objref' of the inode 'in ->fd_objref'
> of the file descriptor. The file descriptor type becomes CR_FD_PIPE.
>
> If found, it is the second encounter of this pipe, namely, as we hit
> the other end of the same pipe. In this case we need only record the
> reference ('objref') to the inode that we had saved before, and the
> file descriptor type is changed to CR_FD_OBJREF.
>
> The type CR_FD_PIPE will indicate to the kernel to create a new pipe;
> since both ends are created at the same time, one end will be used,
> and the other end will be deposited in the hash table for later use.
> The type CR_FD_OBJREF will indicate that the corresponding file
> descriptor is already setup and registered in the hash using the
> '->fd_objref' that it had been assigned.
>
> The format of the pipe data is as follows:
>
> struct cr_hdr_fd_pipe {
> __u32 nr_bufs;
> }
>
> cr_hdr + cr_hdr_fd_ent
> cr_hdr + cr_hdr_fd_data
> cr_hdr + cr_hdr_fd_pipe -> # buffers
> cr_hdr + cr_hdr_buffer -> 1st buffer
> cr_hdr + cr_hdr_buffer -> 2nd buffer
> cr_hdr + cr_hdr_buffer -> 3rd buffer
> ...
>
> Changelog[v14]:
> - Use 'fd_type' instead of 'hh->fd_objref' in cr_write_fd_data()
> - Revert change to pr_debug(), back to cr_debug()
> - Discard the 'h.parent' field
> - Check whether calls to cr_hbuf_get() fail
> - Test that a pipe's inode != ctx->file's inode to prevent deadlock
>
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
But:
> ---
> checkpoint/ckpt_file.c | 2 +
> fs/pipe.c | 113 ++++++++++++++++++++++++++++++++++++++++
> include/linux/checkpoint_hdr.h | 8 +++-
> 3 files changed, 122 insertions(+), 1 deletions(-)
>
> diff --git a/checkpoint/ckpt_file.c b/checkpoint/ckpt_file.c
> index 0fe68bf..dd26b3d 100644
> --- a/checkpoint/ckpt_file.c
> +++ b/checkpoint/ckpt_file.c
> @@ -12,6 +12,7 @@
> #include <linux/sched.h>
> #include <linux/file.h>
> #include <linux/fdtable.h>
> +#include <linux/pipe_fs_i.h>
> #include <linux/checkpoint.h>
> #include <linux/checkpoint_hdr.h>
>
> @@ -72,6 +73,7 @@ int cr_scan_fds(struct files_struct *files, int **fdtable)
> return n;
> }
>
> +
> static int cr_write_file_generic(struct cr_ctx *ctx, struct file *file,
> struct cr_hdr_file *hh)
> {
> diff --git a/fs/pipe.c b/fs/pipe.c
> index 14f502b..0c3f391 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -22,6 +22,9 @@
> #include <asm/uaccess.h>
> #include <asm/ioctls.h>
>
> +#include <linux/checkpoint.h>
> +#include <linux/checkpoint_hdr.h>
> +
> /*
> * We use a start+len construction, which provides full use of the
> * allocated memory.
> @@ -771,6 +774,113 @@ pipe_rdwr_open(struct inode *inode, struct file *filp)
> return 0;
> }
>
> +/* cr_write_pipebuf - dump contents of a pipe/fifo (assume i_mutex taken) */
> +static int cr_write_pipebuf(struct cr_ctx *ctx, struct pipe_inode_info *pipe)
> +{
> + struct cr_hdr h;
> + void *kbuf, *addr;
> + int i, ret = 0;
> +
> + kbuf = (void *) __get_free_page(GFP_KERNEL);
> + if (!kbuf)
> + return -ENOMEM;
> +
> + /* this is a simplified fs/pipe.c:read_pipe() */
pipe_read() actually :)
> +
> + for (i = 0; i < pipe->nrbufs; i++) {
> + int nn = (pipe->curbuf + i) & (PIPE_BUFFERS-1);
> + struct pipe_buffer *pbuf = pipe->bufs + nn;
> + const struct pipe_buf_operations *ops = pbuf->ops;
> +
> + ret = ops->confirm(pipe, pbuf);
> + if (ret < 0)
> + break;
not that it seems to matter, but pipe_read() returns error
also if ret > 0.
> +
> + addr = ops->map(pipe, pbuf, 1);
> + memcpy(kbuf, addr + pbuf->offset, pbuf->len);
> + ops->unmap(pipe, pbuf, addr);
> +
> + h.type = CR_HDR_BUFFER;
> + h.len = pbuf->len;
> +
> + ret = cr_write_obj(ctx, &h, kbuf);
> + if (ret < 0)
> + break;
> + }
> +
> + free_page((unsigned long) kbuf);
> + return ret;
> +}
> +
> +/* cr_write_pipe - dump pipe (assume i_mutex taken) */
> +static int cr_write_pipe(struct cr_ctx *ctx, struct inode *inode)
> +{
> + struct cr_hdr h;
> + struct cr_hdr_fd_pipe *hh;
> + struct pipe_inode_info *pipe = inode->i_pipe;
> + int ret;
> +
> + h.type = CR_HDR_FD_PIPE;
> + h.len = sizeof(*hh);
> +
> + hh = cr_hbuf_get(ctx, sizeof(*hh));
> + if (!hh)
> + return -ENOMEM;
> +
> + hh->nr_bufs = pipe->nrbufs;
> +
> + ret = cr_write_obj(ctx, &h, hh);
> + cr_hbuf_put(ctx, sizeof(*hh));
> + if (ret < 0)
> + return ret;
> +
> + return cr_write_pipebuf(ctx, pipe);
> +}
> +
> +static int pipe_file_checkpoint(struct cr_ctx *ctx,
> + struct file *file, struct cr_hdr_file *hh)
> +{
> + struct cr_hdr h;
> + struct inode *inode = file->f_dentry->d_inode;
> + int new, objref;
> + int ret;
> +
> + /*
> + * We take the inode's mutex and later will call vfs_write(),
> + * which also takes an inode's mutex. To avoid deadlock, make
> + * sure that the two inodes are distinct.
> + */
> + if (ctx->file->f_dentry->d_inode == inode) {
> + pr_warning("c/r: writing to pipe that is checkpointed "
> + "may result in a deadlock ... aborting\n");
> + return -EDEADLK;
> + }
> +
> + h.type = CR_HDR_FILE;
> + h.len = sizeof(*hh);
> +
> + new = cr_obj_add_ptr(ctx, inode, &objref, CR_OBJ_INODE, 0);
> + cr_debug("objref %d inode %p new %d\n", objref, inode, new);
> + if (new < 0)
> + return new;
> +
> + hh->fd_type = (new ? CR_FD_PIPE : CR_FD_OBJREF);
The git commit msg has a good explanation, but it's worth a comment
in the code as well, that on first instance we call it
CR_FD_PIPE and second time CR_FD_OBJREF.
> + hh->fd_objref = objref;
> +
> + ret = cr_write_obj(ctx, &h, hh);
> + if (ret < 0)
> + return ret;
> +
> + if (new) {
> + mutex_lock(&inode->i_mutex);
> + ret = cr_write_pipe(ctx, inode);
> + mutex_unlock(&inode->i_mutex);
> + }
> +
> + return ret;
> +}
> +
> +
> /*
> * The file_operations structs are not static because they
> * are also used in linux/fs/fifo.c to do operations on FIFOs.
> @@ -787,6 +897,7 @@ const struct file_operations read_pipefifo_fops = {
> .open = pipe_read_open,
> .release = pipe_read_release,
> .fasync = pipe_read_fasync,
> + .checkpoint = pipe_file_checkpoint,
> };
>
> const struct file_operations write_pipefifo_fops = {
> @@ -799,6 +910,7 @@ const struct file_operations write_pipefifo_fops = {
> .open = pipe_write_open,
> .release = pipe_write_release,
> .fasync = pipe_write_fasync,
> + .checkpoint = pipe_file_checkpoint,
> };
>
> const struct file_operations rdwr_pipefifo_fops = {
> @@ -812,6 +924,7 @@ const struct file_operations rdwr_pipefifo_fops = {
> .open = pipe_rdwr_open,
> .release = pipe_rdwr_release,
> .fasync = pipe_rdwr_fasync,
> + .checkpoint = pipe_file_checkpoint,
> };
>
> struct pipe_inode_info * alloc_pipe_info(struct inode *inode)
> diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
> index 9ad845d..ce5d880 100644
> --- a/include/linux/checkpoint_hdr.h
> +++ b/include/linux/checkpoint_hdr.h
> @@ -57,6 +57,7 @@ enum {
> CR_HDR_FD_TABLE = 301,
> CR_HDR_FD_ENT,
> CR_HDR_FILE,
> + CR_HDR_FD_PIPE,
>
> CR_HDR_TAIL = 5001
> };
> @@ -152,7 +153,8 @@ struct cr_hdr_fd_ent {
> /* fd types */
> enum fd_type {
> CR_FD_OBJREF = 1,
> - CR_FD_GENERIC
> + CR_FD_GENERIC,
> + CR_FD_PIPE,
> };
>
> struct cr_hdr_file {
> @@ -165,4 +167,8 @@ struct cr_hdr_file {
> __u64 f_version;
> } __attribute__((aligned(8)));
>
> +struct cr_hdr_fd_pipe {
> + __s32 nr_bufs;
> +} __attribute__((aligned(8)));
> +
> #endif /* _CHECKPOINT_CKPT_HDR_H_ */
> --
> 1.5.4.3
next prev parent reply other threads:[~2009-04-01 19:47 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-31 5:28 [RFC v14-rc2][PATCH 00/29] Kernel based checkpoint/restart Oren Laadan
[not found] ` <1238477349-11029-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 01/29] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 02/29] Checkpoint/restart: initial documentation Oren Laadan
[not found] ` <1238477349-11029-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:22 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 03/29] Make file_pos_read/write() public Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 04/29] General infrastructure for checkpoint restart Oren Laadan
[not found] ` <1238477349-11029-5-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:24 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 05/29] x86 support for checkpoint/restart Oren Laadan
[not found] ` <1238477349-11029-6-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:25 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 06/29] Dump memory address space Oren Laadan
[not found] ` <1238477349-11029-7-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:26 ` Sukadev Bhattiprolu
[not found] ` <20090407032636.GD12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 4:57 ` Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 07/29] Restore " Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 08/29] Infrastructure for shared objects Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 09/29] Dump open file descriptors Oren Laadan
[not found] ` <1238477349-11029-10-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:28 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 10/29] actually use f_op in checkpoint code Oren Laadan
[not found] ` <1238477349-11029-11-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-31 18:31 ` Oren Laadan
2009-04-01 18:54 ` Serge E. Hallyn
2009-04-07 3:29 ` Sukadev Bhattiprolu
[not found] ` <20090407032912.GF12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 5:36 ` Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 11/29] add generic checkpoint f_op to ext fses Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 12/29] Restore open file descriptors Oren Laadan
[not found] ` <1238477349-11029-13-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:29 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 13/29] External checkpoint of a task other than ourself Oren Laadan
[not found] ` <1238477349-11029-14-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:30 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 14/29] Checkpoint multiple processes Oren Laadan
[not found] ` <1238477349-11029-15-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:31 ` Sukadev Bhattiprolu
[not found] ` <20090407033111.GI12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 5:12 ` Oren Laadan
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 15/29] Restart " Oren Laadan
[not found] ` <1238477349-11029-16-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 3:33 ` Sukadev Bhattiprolu
[not found] ` <20090407033315.GJ12316-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-07 5:31 ` Oren Laadan
[not found] ` <49DAE526.6010900-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 16:29 ` Sukadev Bhattiprolu
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 16/29] A new file type (CR_FD_OBJREF) for a file descriptor already setup Oren Laadan
[not found] ` <1238477349-11029-17-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 13:59 ` Serge E. Hallyn
[not found] ` <20090401135952.GA16973-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-01 14:13 ` Oren Laadan
2009-04-01 18:36 ` Serge E. Hallyn
2009-04-03 15:46 ` Dan Smith
[not found] ` <87y6uhyc3j.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-03 16:25 ` Oren Laadan
[not found] ` <49D63865.1030807-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-03 16:30 ` Dan Smith
2009-04-03 16:54 ` Dave Hansen
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 17/29] Checkpoint open pipes Oren Laadan
[not found] ` <1238477349-11029-18-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 19:47 ` Serge E. Hallyn [this message]
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 18/29] Restore " Oren Laadan
[not found] ` <1238477349-11029-19-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 20:34 ` Serge E. Hallyn
2009-03-31 5:28 ` [RFC v14-rc2][PATCH 19/29] Record 'struct file' object instead of the file name for VMAs Oren Laadan
[not found] ` <1238477349-11029-20-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 21:45 ` Serge E. Hallyn
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 20/29] Prepare to support shared memory Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 21/29] Dump anonymous- and file-mapped- " Oren Laadan
[not found] ` <1238477349-11029-22-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:06 ` Serge E. Hallyn
[not found] ` <20090401230657.GB27725-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-01 23:18 ` Oren Laadan
[not found] ` <49D3F636.1070303-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:32 ` Serge E. Hallyn
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 22/29] Restore " Oren Laadan
[not found] ` <1238477349-11029-23-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-02 16:59 ` Serge E. Hallyn
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 23/29] s390: Expose a constant for the number of words representing the CRs Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 24/29] c/r: Add CR_COPY() macro (v4) Oren Laadan
[not found] ` <1238477349-11029-25-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-01 23:20 ` Serge E. Hallyn
[not found] ` <20090401232013.GA31361-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 19:00 ` Dan Smith
[not found] ` <87vdpmnan2.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-02 19:06 ` Serge E. Hallyn
[not found] ` <20090402190612.GA24390-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 20:22 ` Dan Smith
[not found] ` <87r60an6us.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-05 20:25 ` Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 25/29] s390: define s390-specific checkpoint-restart code (v7) Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 26/29] powerpc: provide APIs for validating and updating DABR Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 27/29] powerpc: checkpoint/restart implementation Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 28/29] powerpc: wire up checkpoint and restart syscalls Oren Laadan
2009-03-31 5:29 ` [RFC v14-rc2][PATCH 29/29] powerpc: enable checkpoint support in Kconfig Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090401194726.GA24066@us.ibm.com \
--to=serue-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
--cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox