From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Linus Torvalds <torvalds-3NddpPZAyC0@public.gmane.org>,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>,
Dave Hansen
<dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>,
"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>,
Alexander Viro
<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Subject: [RFC v9][PATCH 08/13] Dump open file descriptors
Date: Mon, 10 Nov 2008 11:37:35 -0500 [thread overview]
Message-ID: <1226335060-7061-9-git-send-email-orenl@cs.columbia.edu> (raw)
In-Reply-To: <1226335060-7061-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Dump the files_struct of a task with 'struct cr_hdr_files', followed by
all open file descriptors. Since FDs can be shared, they are assigned an
objref and registered in the object hash.
For each open FD there is a 'struct cr_hdr_fd_ent' with the FD, its objref
and its close-on-exec property. If the FD is to be saved (first time)
then this is followed by a 'struct cr_hdr_fd_data' with the FD state.
Then will come the next FD and so on.
This patch only handles basic FDs - regular files, directories and also
symbolic links.
Changelog[v9]:
- Fix a couple of leaks in cr_write_files()
- Drop useless kfree from cr_scan_fds()
Changelog[v8]:
- initialize 'coe' to workaround gcc false warning
Changelog[v6]:
- Balance all calls to cr_hbuf_get() with matching cr_hbuf_put()
(even though it's not really needed)
Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Dave Hansen <dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
checkpoint/Makefile | 2 +-
checkpoint/checkpoint.c | 4 +
checkpoint/checkpoint_file.h | 17 +++
checkpoint/ckpt_file.c | 231 ++++++++++++++++++++++++++++++++++++++++
include/linux/checkpoint.h | 3 +-
include/linux/checkpoint_hdr.h | 32 ++++++-
6 files changed, 286 insertions(+), 3 deletions(-)
create mode 100644 checkpoint/checkpoint_file.h
create mode 100644 checkpoint/ckpt_file.c
diff --git a/checkpoint/Makefile b/checkpoint/Makefile
index 9843fb9..7496695 100644
--- a/checkpoint/Makefile
+++ b/checkpoint/Makefile
@@ -3,4 +3,4 @@
#
obj-$(CONFIG_CHECKPOINT_RESTART) += sys.o checkpoint.o restart.o objhash.o \
- ckpt_mem.o rstr_mem.o
+ ckpt_mem.o rstr_mem.o ckpt_file.o
diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index e162753..700f829 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -213,6 +213,10 @@ static int cr_write_task(struct cr_ctx *ctx, struct task_struct *t)
cr_debug("memory: ret %d\n", ret);
if (ret < 0)
goto out;
+ ret = cr_write_files(ctx, t);
+ cr_debug("files: ret %d\n", ret);
+ if (ret < 0)
+ goto out;
ret = cr_write_thread(ctx, t);
cr_debug("thread: ret %d\n", ret);
if (ret < 0)
diff --git a/checkpoint/checkpoint_file.h b/checkpoint/checkpoint_file.h
new file mode 100644
index 0000000..9dc3eba
--- /dev/null
+++ b/checkpoint/checkpoint_file.h
@@ -0,0 +1,17 @@
+#ifndef _CHECKPOINT_CKPT_FILE_H_
+#define _CHECKPOINT_CKPT_FILE_H_
+/*
+ * Checkpoint file descriptors
+ *
+ * Copyright (C) 2008 Oren Laadan
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file COPYING in the main directory of the Linux
+ * distribution for more details.
+ */
+
+#include <linux/fdtable.h>
+
+int cr_scan_fds(struct files_struct *files, int **fdtable);
+
+#endif /* _CHECKPOINT_CKPT_FILE_H_ */
diff --git a/checkpoint/ckpt_file.c b/checkpoint/ckpt_file.c
new file mode 100644
index 0000000..9198650
--- /dev/null
+++ b/checkpoint/ckpt_file.c
@@ -0,0 +1,231 @@
+/*
+ * Checkpoint file descriptors
+ *
+ * Copyright (C) 2008 Oren Laadan
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file COPYING in the main directory of the Linux
+ * distribution for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/file.h>
+#include <linux/fdtable.h>
+#include <linux/checkpoint.h>
+#include <linux/checkpoint_hdr.h>
+
+#include "checkpoint_file.h"
+
+#define CR_DEFAULT_FDTABLE 256 /* an initial guess */
+
+/**
+ * cr_scan_fds - scan file table and construct array of open fds
+ * @files: files_struct pointer
+ * @fdtable: (output) array of open fds
+ *
+ * Returns the number of open fds found, and also the file table
+ * array via *fdtable. The caller should free the array.
+ *
+ * The caller must validate the file descriptors collected in the
+ * array before using them, e.g. by using fcheck_files(), in case
+ * the task's fdtable changes in the meantime.
+ */
+int cr_scan_fds(struct files_struct *files, int **fdtable)
+{
+ struct fdtable *fdt;
+ int *fds;
+ int i, n = 0;
+ int tot = CR_DEFAULT_FDTABLE;
+
+ fds = kmalloc(tot * sizeof(*fds), GFP_KERNEL);
+ if (!fds)
+ return -ENOMEM;
+
+ /*
+ * We assume that the target task is frozen (or that we checkpoint
+ * ourselves), so we can safely proceed after krealloc() from where
+ * we left off; in the worst cases restart will fail.
+ */
+
+ spin_lock(&files->file_lock);
+ rcu_read_lock();
+ fdt = files_fdtable(files);
+ for (i = 0; i < fdt->max_fds; i++) {
+ if (!fcheck_files(files, i))
+ continue;
+ if (n == tot) {
+ /*
+ * fcheck_files() is safe with drop/re-acquire
+ * of the lock, because it tests: fd < max_fds
+ */
+ spin_unlock(&files->file_lock);
+ rcu_read_unlock();
+ tot *= 2; /* won't overflow: kmalloc will fail */
+ fds = krealloc(fds, tot * sizeof(*fds), GFP_KERNEL);
+ if (!fds)
+ return -ENOMEM;
+ rcu_read_lock();
+ spin_lock(&files->file_lock);
+ }
+ fds[n++] = i;
+ }
+ rcu_read_unlock();
+ spin_unlock(&files->file_lock);
+
+ *fdtable = fds;
+ return n;
+}
+
+/* cr_write_fd_data - dump the state of a given file pointer */
+static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
+{
+ struct cr_hdr h;
+ struct cr_hdr_fd_data *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ struct dentry *dent = file->f_dentry;
+ struct inode *inode = dent->d_inode;
+ enum fd_type fd_type;
+ int ret;
+
+ h.type = CR_HDR_FD_DATA;
+ h.len = sizeof(*hh);
+ h.parent = parent;
+
+ hh->f_flags = file->f_flags;
+ hh->f_mode = file->f_mode;
+ hh->f_pos = file->f_pos;
+ hh->f_version = file->f_version;
+ /* FIX: need also file->uid, file->gid, file->f_owner, etc */
+
+ switch (inode->i_mode & S_IFMT) {
+ case S_IFREG:
+ fd_type = CR_FD_FILE;
+ break;
+ case S_IFDIR:
+ fd_type = CR_FD_DIR;
+ break;
+ case S_IFLNK:
+ fd_type = CR_FD_LINK;
+ break;
+ default:
+ cr_hbuf_put(ctx, sizeof(*hh));
+ return -EBADF;
+ }
+
+ /* FIX: check if the file/dir/link is unlinked */
+ hh->fd_type = fd_type;
+
+ ret = cr_write_obj(ctx, &h, hh);
+ cr_hbuf_put(ctx, sizeof(*hh));
+ if (ret < 0)
+ return ret;
+
+ return cr_write_fname(ctx, &file->f_path, ctx->vfsroot);
+}
+
+/**
+ * cr_write_fd_ent - dump the state of a given file descriptor
+ * @ctx: checkpoint context
+ * @files: files_struct pointer
+ * @fd: file descriptor
+ *
+ * Saves the state of the file descriptor; looks up the actual file
+ * pointer in the hash table, and if found saves the matching objref,
+ * otherwise calls cr_write_fd_data to dump the file pointer too.
+ */
+static int
+cr_write_fd_ent(struct cr_ctx *ctx, struct files_struct *files, int fd)
+{
+ struct cr_hdr h;
+ struct cr_hdr_fd_ent *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ struct file *file;
+ struct fdtable *fdt;
+ int objref, new, ret;
+ int coe = 0; /* avoid gcc warning */
+
+ rcu_read_lock();
+ fdt = files_fdtable(files);
+ file = fcheck_files(files, fd);
+ if (file) {
+ coe = FD_ISSET(fd, fdt->close_on_exec);
+ get_file(file);
+ }
+ rcu_read_unlock();
+
+ /* sanity check (although this shouldn't happen) */
+ if (!file) {
+ ret = -EBADF;
+ goto out;
+ }
+
+ new = cr_obj_add_ptr(ctx, file, &objref, CR_OBJ_FILE, 0);
+ cr_debug("fd %d objref %d file %p c-o-e %d)\n", fd, objref, file, coe);
+
+ if (new < 0) {
+ ret = new;
+ goto out;
+ }
+
+ h.type = CR_HDR_FD_ENT;
+ h.len = sizeof(*hh);
+ h.parent = 0;
+
+ hh->objref = objref;
+ hh->fd = fd;
+ hh->close_on_exec = coe;
+
+ ret = cr_write_obj(ctx, &h, hh);
+ if (ret < 0)
+ goto out;
+
+ /* new==1 if-and-only-if file was newly added to hash */
+ if (new)
+ ret = cr_write_fd_data(ctx, file, objref);
+
+out:
+ cr_hbuf_put(ctx, sizeof(*hh));
+ if (file)
+ fput(file);
+ return ret;
+}
+
+int cr_write_files(struct cr_ctx *ctx, struct task_struct *t)
+{
+ struct cr_hdr h;
+ struct cr_hdr_files *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ struct files_struct *files;
+ int *fdtable = NULL;
+ int nfds, n, ret;
+
+ h.type = CR_HDR_FILES;
+ h.len = sizeof(*hh);
+ h.parent = task_pid_vnr(t);
+
+ files = get_files_struct(t);
+
+ nfds = cr_scan_fds(files, &fdtable);
+ if (nfds < 0) {
+ ret = nfds;
+ goto out;
+ }
+
+ hh->objref = 0; /* will be meaningful with multiple processes */
+ hh->nfds = nfds;
+
+ ret = cr_write_obj(ctx, &h, hh);
+ cr_hbuf_put(ctx, sizeof(*hh));
+ if (ret < 0)
+ goto out;
+
+ cr_debug("nfds %d\n", nfds);
+ for (n = 0; n < nfds; n++) {
+ ret = cr_write_fd_ent(ctx, files, fdtable[n]);
+ if (ret < 0)
+ break;
+ }
+
+ out:
+ kfree(fdtable);
+ put_files_struct(files);
+ return ret;
+}
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 0e4ba74..bca7aef 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -13,7 +13,7 @@
#include <linux/path.h>
#include <linux/fs.h>
-#define CR_VERSION 1
+#define CR_VERSION 2
struct cr_ctx {
int crid; /* unique checkpoint id */
@@ -80,6 +80,7 @@ extern struct file *cr_read_open_fname(struct cr_ctx *ctx,
extern int do_checkpoint(struct cr_ctx *ctx, pid_t pid);
extern int cr_write_mm(struct cr_ctx *ctx, struct task_struct *t);
+extern int cr_write_files(struct cr_ctx *ctx, struct task_struct *t);
extern int do_restart(struct cr_ctx *ctx, pid_t pid);
extern int cr_read_mm(struct cr_ctx *ctx);
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index c2e1022..3a21179 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -17,7 +17,7 @@
/*
* To maintain compatibility between 32-bit and 64-bit architecture flavors,
* keep data 64-bit aligned: use padding for structure members, and use
- * __attribute__ ((aligned (8))) for the entire structure.
+ * __attribute__((aligned(8))) for the entire structure.
*/
/* records: generic header */
@@ -44,6 +44,10 @@ enum {
CR_HDR_PGARR,
CR_HDR_MM_CONTEXT,
+ CR_HDR_FILES = 301,
+ CR_HDR_FD_ENT,
+ CR_HDR_FD_DATA,
+
CR_HDR_TAIL = 5001
};
@@ -106,4 +110,30 @@ struct cr_hdr_pgarr {
__u64 nr_pages; /* number of pages to saved */
} __attribute__((aligned(8)));
+struct cr_hdr_files {
+ __u32 objref; /* identifier for shared objects */
+ __u32 nfds;
+} __attribute__((aligned(8)));
+
+struct cr_hdr_fd_ent {
+ __u32 objref; /* identifier for shared objects */
+ __s32 fd;
+ __u32 close_on_exec;
+} __attribute__((aligned(8)));
+
+/* fd types */
+enum fd_type {
+ CR_FD_FILE = 1,
+ CR_FD_DIR,
+ CR_FD_LINK
+};
+
+struct cr_hdr_fd_data {
+ __u16 fd_type;
+ __u16 f_mode;
+ __u32 f_flags;
+ __u64 f_pos;
+ __u64 f_version;
+} __attribute__((aligned(8)));
+
#endif /* _CHECKPOINT_CKPT_HDR_H_ */
--
1.5.4.3
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2008-11-10 16:37 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-10 16:37 [RFC v9][PATCH 00/13] Kernel based checkpoint/restart Oren Laadan
[not found] ` <1226335060-7061-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-10 16:37 ` [RFC v9][PATCH 01/13] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-11-10 16:37 ` [RFC v9][PATCH 02/13] Checkpoint/restart: initial documentation Oren Laadan
2008-11-10 16:37 ` [RFC v9][PATCH 03/13] General infrastructure for checkpoint restart Oren Laadan
2008-11-10 16:37 ` [RFC v9][PATCH 04/13] x86 support for checkpoint/restart Oren Laadan
2008-11-10 16:37 ` [RFC v9][PATCH 07/13] Infrastructure for shared objects Oren Laadan
2008-11-10 16:37 ` Oren Laadan [this message]
2008-11-10 16:37 ` [RFC v9][PATCH 09/13] Restore open file descriprtors Oren Laadan
2008-11-10 16:37 ` [RFC v9][PATCH 12/13] Checkpoint multiple processes Oren Laadan
[not found] ` <1226335060-7061-13-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-14 3:41 ` Serge E. Hallyn
2008-11-10 16:37 ` [RFC v9][PATCH 13/13] Restart " Oren Laadan
2008-11-12 5:03 ` [RFC v9][PATCH 00/13] Kernel based checkpoint/restart Serge E. Hallyn
2008-11-10 16:37 ` [RFC v9][PATCH 05/13] Dump memory address space Oren Laadan
[not found] ` <1226335060-7061-6-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-10 20:12 ` Serge E. Hallyn
2008-11-10 20:34 ` Oren Laadan
2008-11-11 16:45 ` Serge E. Hallyn
[not found] ` <20081111164517.GA15999-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-11-11 23:53 ` Oren Laadan
2008-11-10 16:37 ` [RFC v9][PATCH 06/13] Restore " Oren Laadan
2008-11-10 16:37 ` [RFC v9][PATCH 10/13] External checkpoint of a task other than ourself Oren Laadan
2008-11-10 16:37 ` [RFC v9][PATCH 11/13] Track in-kernel when we expect checkpoint/restart to work Oren Laadan, Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1226335060-7061-9-git-send-email-orenl@cs.columbia.edu \
--to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
--cc=hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=mingo-X9Un+BFzKDI@public.gmane.org \
--cc=serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
--cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
--cc=torvalds-3NddpPZAyC0@public.gmane.org \
--cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).