linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
To: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Linus Torvalds <torvalds-3NddpPZAyC0@public.gmane.org>,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>,
	Dave Hansen
	<dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>,
	"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
Subject: Re: [RFC v10][PATCH 05/13] Dump memory address space
Date: Fri, 28 Nov 2008 10:53:51 +0000	[thread overview]
Message-ID: <20081128105351.GQ28946@ZenIV.linux.org.uk> (raw)
In-Reply-To: <1227747884-14150-6-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>

On Wed, Nov 26, 2008 at 08:04:36PM -0500, Oren Laadan wrote:
> For each VMA, there is a 'struct cr_vma'; if the VMA is file-mapped,
> it will be followed by the file name. Then comes the actual contents,
> in one or more chunk: each chunk begins with a header that specifies
> how many pages it holds, then the virtual addresses of all the dumped
> pages in that chunk, followed by the actual contents of all dumped
> pages. A header with zero number of pages marks the end of the contents.
> Then comes the next VMA and so on.
> 
> Changelog[v10]:
>   - Acquire dcache_lock around call to __d_path() in cr_fill_name()
> 
> Changelog[v9]:
>   - Introduce cr_ctx_checkpoint() for checkpoint-specific ctx setup
>   - Test if __d_path() changes mnt/dentry (when crossing filesystem
>     namespace boundary). for now cr_fill_fname() fails the checkpoint.
> 
> Changelog[v7]:
>   - Fix argument given to kunmap_atomic() in memory dump/restore
> 
> Changelog[v6]:
>   - Balance all calls to cr_hbuf_get() with matching cr_hbuf_put()
>     (even though it's not really needed)
> 
> Changelog[v5]:
>   - Improve memory dump code (following Dave Hansen's comments)
>   - Change dump format (and code) to allow chunks of <vaddrs, pages>
>     instead of one long list of each
>   - Fix use of follow_page() to avoid faulting in non-present pages
> 
> Changelog[v4]:
>   - Use standard list_... for cr_pgarr
> 
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Dave Hansen <dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>  arch/x86/include/asm/checkpoint_hdr.h |    5 +
>  arch/x86/mm/checkpoint.c              |   31 ++
>  checkpoint/Makefile                   |    3 +-
>  checkpoint/checkpoint.c               |   81 ++++++
>  checkpoint/checkpoint_arch.h          |    2 +
>  checkpoint/checkpoint_mem.h           |   41 +++
>  checkpoint/ckpt_mem.c                 |  500 +++++++++++++++++++++++++++++++++
>  checkpoint/sys.c                      |   11 +
>  include/linux/checkpoint.h            |   12 +
>  include/linux/checkpoint_hdr.h        |   32 ++
>  10 files changed, 717 insertions(+), 1 deletions(-)
>  create mode 100644 checkpoint/checkpoint_mem.h
>  create mode 100644 checkpoint/ckpt_mem.c
> 
> diff --git a/arch/x86/include/asm/checkpoint_hdr.h b/arch/x86/include/asm/checkpoint_hdr.h
> index 6325062..33f4c70 100644
> --- a/arch/x86/include/asm/checkpoint_hdr.h
> +++ b/arch/x86/include/asm/checkpoint_hdr.h
> @@ -82,4 +82,9 @@ struct cr_hdr_cpu {
>  	/* thread_xstate contents follow (if used_math) */
>  } __attribute__((aligned(8)));
>  
> +struct cr_hdr_mm_context {
> +	__s16 ldt_entry_size;
> +	__s16 nldt;
> +} __attribute__((aligned(8)));
> +
>  #endif /* __ASM_X86_CKPT_HDR__H */
> diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
> index 8dd6d2d..757936e 100644
> --- a/arch/x86/mm/checkpoint.c
> +++ b/arch/x86/mm/checkpoint.c
> @@ -221,3 +221,34 @@ int cr_write_head_arch(struct cr_ctx *ctx)
>  
>  	return ret;
>  }
> +
> +/* dump the mm->context state */
> +int cr_write_mm_context(struct cr_ctx *ctx, struct mm_struct *mm, int parent)
> +{
> +	struct cr_hdr h;
> +	struct cr_hdr_mm_context *hh = cr_hbuf_get(ctx, sizeof(*hh));
> +	int ret;
> +
> +	h.type = CR_HDR_MM_CONTEXT;
> +	h.len = sizeof(*hh);
> +	h.parent = parent;
> +
> +	mutex_lock(&mm->context.lock);
> +
> +	hh->ldt_entry_size = LDT_ENTRY_SIZE;
> +	hh->nldt = mm->context.size;
> +
> +	cr_debug("nldt %d\n", hh->nldt);
> +
> +	ret = cr_write_obj(ctx, &h, hh);
> +	cr_hbuf_put(ctx, sizeof(*hh));
> +	if (ret < 0)
> +		goto out;
> +
> +	ret = cr_kwrite(ctx, mm->context.ldt,
> +			mm->context.size * LDT_ENTRY_SIZE);
> +
> + out:
> +	mutex_unlock(&mm->context.lock);
> +	return ret;
> +}
> diff --git a/checkpoint/Makefile b/checkpoint/Makefile
> index d2df68c..3a0df6d 100644
> --- a/checkpoint/Makefile
> +++ b/checkpoint/Makefile
> @@ -2,4 +2,5 @@
>  # Makefile for linux checkpoint/restart.
>  #
>  
> -obj-$(CONFIG_CHECKPOINT_RESTART) += sys.o checkpoint.o restart.o
> +obj-$(CONFIG_CHECKPOINT_RESTART) += sys.o checkpoint.o restart.o \
> +		ckpt_mem.o
> diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
> index 17cc8d2..6a8f810 100644
> --- a/checkpoint/checkpoint.c
> +++ b/checkpoint/checkpoint.c
> @@ -75,6 +75,66 @@ int cr_write_string(struct cr_ctx *ctx, char *str, int len)
>  	return cr_write_obj(ctx, &h, str);
>  }
>  
> +/**
> + * cr_fill_fname - return pathname of a given file
> + * @path: path name
> + * @root: relative root
> + * @buf: buffer for pathname
> + * @n: buffer length (in) and pathname length (out)
> + */
> +static char *
> +cr_fill_fname(struct path *path, struct path *root, char *buf, int *n)
> +{
> +	struct path tmp = *root;
> +	char *fname;
> +
> +	BUG_ON(!buf);
> +	spin_lock(&dcache_lock);
> +	fname = __d_path(path, &tmp, buf, *n);
> +	spin_unlock(&dcache_lock);
> +	if (!IS_ERR(fname))
> +		*n = (buf + (*n) - fname);
> +	/*
> +	 * FIXME: if __d_path() changed these, it must have stepped out of
> +	 * init's namespace. Since currently we require a unified namespace
> +	 * within the container: simply fail.
> +	 */
> +	if (tmp.mnt != root->mnt || tmp.dentry != root->dentry)
> +		fname = ERR_PTR(-EBADF);
> +
> +	return fname;
> +}
> +
> +/**
> + * cr_write_fname - write a file name
> + * @ctx: checkpoint context
> + * @path: path name
> + * @root: relative root
> + */
> +int cr_write_fname(struct cr_ctx *ctx, struct path *path, struct path *root)
> +{
> +	struct cr_hdr h;
> +	char *buf, *fname;
> +	int ret, flen;
> +
> +	flen = PATH_MAX;
> +	buf = kmalloc(flen, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	fname = cr_fill_fname(path, root, buf, &flen);
> +	if (!IS_ERR(fname)) {
> +		h.type = CR_HDR_FNAME;
> +		h.len = flen;
> +		h.parent = 0;
> +		ret = cr_write_obj(ctx, &h, fname);
> +	} else
> +		ret = PTR_ERR(fname);
> +
> +	kfree(buf);
> +	return ret;
> +}
> +
>  /* write the checkpoint header */
>  static int cr_write_head(struct cr_ctx *ctx)
>  {
> @@ -168,6 +228,10 @@ static int cr_write_task(struct cr_ctx *ctx, struct task_struct *t)
>  	cr_debug("task_struct: ret %d\n", ret);
>  	if (ret < 0)
>  		goto out;
> +	ret = cr_write_mm(ctx, t);
> +	cr_debug("memory: ret %d\n", ret);
> +	if (ret < 0)
> +		goto out;
>  	ret = cr_write_thread(ctx, t);
>  	cr_debug("thread: ret %d\n", ret);
>  	if (ret < 0)
> @@ -178,10 +242,27 @@ static int cr_write_task(struct cr_ctx *ctx, struct task_struct *t)
>  	return ret;
>  }
>  
> +static int cr_ctx_checkpoint(struct cr_ctx *ctx, pid_t pid)
> +{
> +	ctx->root_pid = pid;
> +
> +	/*
> +	 * assume checkpointer is in container's root vfs
> +	 * FIXME: this works for now, but will change with real containers
> +	 */
> +	ctx->vfsroot = &current->fs->root;
> +	path_get(ctx->vfsroot);

This is going to break as soon as you get another thread doing e.g. chroot(2)
while you are in there.  And it's a really, _really_ bad idea to take a
pointer to shared object, increment refcount on the current *contents* of
said object and assume that dropping refcount on the later contents of the
same will balance out.
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2008-11-28 10:53 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-27  1:04 [RFC v10][PATCH 00/13] Kernel based checkpoint/restart Oren Laadan
     [not found] ` <1227747884-14150-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-27  1:04   ` [RFC v10][PATCH 01/13] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 02/13] Checkpoint/restart: initial documentation Oren Laadan
     [not found]     ` <1227747884-14150-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 10:45       ` Al Viro
     [not found]         ` <20081128104554.GP28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 18:15           ` Dave Hansen
2008-11-27  1:04   ` [RFC v10][PATCH 03/13] General infrastructure for checkpoint restart Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 04/13] x86 support for checkpoint/restart Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 05/13] Dump memory address space Oren Laadan
     [not found]     ` <1227747884-14150-6-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 10:53       ` Al Viro [this message]
     [not found]         ` <20081128105351.GQ28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 18:00           ` Dave Hansen
2008-12-01 20:57             ` Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 06/13] Restore " Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 07/13] Infrastructure for shared objects Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 08/13] Dump open file descriptors Oren Laadan
     [not found]     ` <1227747884-14150-9-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 10:19       ` Al Viro
     [not found]         ` <20081128101919.GO28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 17:47           ` Dave Hansen
2008-12-01 20:23             ` Oren Laadan
     [not found]               ` <493447DD.7010102-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-12-01 20:51                 ` Dave Hansen
2008-12-01 21:02                   ` Linus Torvalds
     [not found]                     ` <alpine.LFD.2.00.0812011258390.3256-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-12-01 21:25                       ` Dave Hansen
2008-12-01 21:20                   ` Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 09/13] Restore open file descriprtors Oren Laadan
     [not found]     ` <1227747884-14150-10-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 11:27       ` Al Viro
     [not found]         ` <20081128112745.GR28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 19:22           ` Dave Hansen
2008-12-01 20:41             ` Oren Laadan
     [not found]               ` <49344C11.6090204-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-12-01 20:54                 ` Dave Hansen
2008-12-01 21:00                   ` Oren Laadan
     [not found]                     ` <49345086.4-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-12-01 21:07                       ` Dave Hansen
2008-12-02  1:31                         ` Dave Hansen
2008-12-02  1:12                       ` Dave Hansen
2008-11-27  1:04   ` [RFC v10][PATCH 10/13] External checkpoint of a task other than ourself Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 11/13] Track in-kernel when we expect checkpoint/restart to work Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 12/13] Checkpoint multiple processes Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 13/13] Restart " Oren Laadan
2008-12-03 23:58   ` [RFC v10][PATCH 00/13] Kernel based checkpoint/restart Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081128105351.GQ28946@ZenIV.linux.org.uk \
    --to=viro-3bdd1+5odreifsdqtta3olvcufugdwfn@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mingo-X9Un+BFzKDI@public.gmane.org \
    --cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
    --cc=serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    --cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
    --cc=torvalds-3NddpPZAyC0@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).