All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Cc: Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org
Subject: Re: [RFC][PATCH 2/2] CR: handle a single task with private memory maps
Date: Wed, 30 Jul 2008 11:52:49 -0500	[thread overview]
Message-ID: <20080730165249.GA23802@us.ibm.com> (raw)

This list is getting on my nerves.  Louis, I'm sorry the threading
is going to get messed up.

----- Forwarded message from mailman-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org -----

Subject: Content filtered message notification
From: mailman-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
To: containers-owner-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Date: Wed, 30 Jul 2008 09:16:12 -0700

The attached message matched the containers mailing list's content
filtering rules and was prevented from being forwarded on to the list
membership.  You are receiving the only remaining copy of the
discarded message.


Date: Wed, 30 Jul 2008 18:15:35 +0200
From: Louis Rilling <Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org>
To: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: Linux Containers <containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [RFC][PATCH 2/2] CR: handle a single task with private memory
	maps
Reply-To: Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org

Hi Oren,

On Tue, Jul 29, 2008 at 11:27:17PM -0400, Oren Laadan wrote:
> 
> Expand the template sys_checkpoint and sys_restart to be able to dump
> and restore a single task. The task's address space may consist of only
> private, simple vma's - anonymous or file-mapped.
> 
> This big patch adds a mechanism to transfer data between kernel or user
> space to and from the file given by the caller (sys.c), alloc/setup/free
> of the checkpoint/restart context (sys.c), output wrappers and basic
> checkpoint handling (checkpoint.c), memory dump (ckpt_mem.c), input
> wrappers and basic restart handling (restart.c), and finally the memory
> restore (rstr_mem.c).

This looks globally clean to me, but I'm sure that others will have stronger
arguments against or in favor of it.

Just a few comments inline, in case it helps.

[...]

> diff --git a/ckpt/checkpoint.c b/ckpt/checkpoint.c
> new file mode 100644
> index 0000000..1698a35
> --- /dev/null
> +++ b/ckpt/checkpoint.c

[...]

> +/* dump the task_struct of a given task */
> +static int cr_write_task_struct(struct cr_ctx *ctx, struct task_struct *t)
> +{
> +	struct cr_hdr h;
> +	struct cr_hdr_task *hh = ctx->tbuf;
> +
> +	h.type = CR_HDR_TASK;
> +	h.len = sizeof(*hh);
> +	h.id = ctx->pid;
> +
> +	hh->state = t->state;
> +	hh->exit_state = t->exit_state;
> +	hh->exit_code = t->exit_code;
> +	hh->exit_signal = t->exit_signal;
> +
> +	hh->pid = t->pid;
> +	hh->tgid = t->tgid;

IIRC, it is assumed that pid and tgid will be restored before actually calling
sys_restart(), eg by giving the proper pid and clone flags to a variant of
do_fork(). So, maybe these ids are useless here and should be put earlier in the
checkpoint header (see also the matching comment below in cr_read_task_struct()).

[...]

> diff --git a/ckpt/ckpt_mem.c b/ckpt/ckpt_mem.c
> new file mode 100644
> index 0000000..12caad0
> --- /dev/null
> +++ b/ckpt/ckpt_mem.c

[...]

> +/**
> + * cr_vma_fill_pgarr - fill a page-array with addr/page tuples for a vma
> + * @ctx - checkpoint context
> + * @pgarr - page-array to fill
> + * @vma - vma to scan
> + * @start - start address (updated)
> + */
> +static int cr_vma_fill_pgarr(struct cr_ctx *ctx, struct cr_pgarr *pgarr,
> +			     struct vm_area_struct *vma, unsigned long *start)
> +{
> +	unsigned long end = vma->vm_end;
> +	unsigned long addr = *start;
> +	struct page **pagep;
> +	unsigned long *addrp;
> +	int cow, nr, ret = 0;
> +
> +	nr = pgarr->nleft;
> +	pagep = &pgarr->pages[pgarr->nused];
> +	addrp = &pgarr->addrs[pgarr->nused];
> +	cow = !!vma->vm_file;
> +
> +	while (addr < end) {
> +		struct page *page;
> +
> +		/* simplified version of get_user_pages(): already have vma,
> +		* only need FOLL_TOUCH, and (for now) ignore fault stats */
> +
> +		cond_resched();
> +		while (!(page = follow_page(vma, addr, FOLL_TOUCH))) {
> +			ret = handle_mm_fault(vma->vm_mm, vma, addr, 0);
> +			if (ret & VM_FAULT_ERROR) {
> +				if (ret & VM_FAULT_OOM)
> +					ret = -ENOMEM;
> +				else if (ret & VM_FAULT_SIGBUS)
> +					ret = -EFAULT;
> +				else
> +					BUG();
> +				break;
> +			}
> +			cond_resched();
> +		}

I guess that 'ret' should be checked somewhere after this loop.

> +
> +		if (IS_ERR(page)) {
> +			ret = PTR_ERR(page);
> +			break;
> +		}
> +
> +		if (page == ZERO_PAGE(0))
> +			page = NULL;	/* zero page: ignore */
> +		else if (cow && page_mapping(page) != NULL)
> +			page = NULL;	/* clean cow: ignore */
> +		else {
> +			get_page(page);
> +			*(addrp++) = addr;
> +			*(pagep++) = page;
> +			if (--nr == 0) {
> +				addr += PAGE_SIZE;
> +				break;
> +			}
> +		}
> +
> +		addr += PAGE_SIZE;
> +	}
> +
> +	if (unlikely(ret < 0)) {
> +		nr = pgarr->nleft - nr;
> +		while (nr--)
> +			page_cache_release(*(--pagep));
> +		return ret;
> +	}
> +
> +	*start = addr;
> +	return (pgarr->nleft - nr);
> +}
> +

[...]

> +int cr_write_mm(struct cr_ctx *ctx, struct task_struct *t)
> +{
> +	struct cr_hdr h;
> +	struct cr_hdr_mm *hh = ctx->tbuf;
> +	struct mm_struct *mm;
> +	struct vm_area_struct *vma;
> +	int ret;
> +
> +	h.type = CR_HDR_MM;
> +	h.len = sizeof(*hh);
> +	h.id = ctx->pid;
> +
> +	mm = get_task_mm(t);
> +
> +	hh->tag = 1;	/* non-zero will mean first time encounter */
> +
> +	hh->start_code = mm->start_code;
> +	hh->end_code = mm->end_code;
> +	hh->start_data = mm->start_data;
> +	hh->end_data = mm->end_data;
> +	hh->start_brk = mm->start_brk;
> +	hh->brk = mm->brk;
> +	hh->start_stack = mm->start_stack;
> +	hh->arg_start = mm->arg_start;
> +	hh->arg_end = mm->arg_end;
> +	hh->env_start = mm->env_start;
> +	hh->env_end = mm->env_end;
> +
> +	hh->map_count = mm->map_count;

Some fields above should also be protected with mmap_sem, like ->brk,
->map_count, and possibly others (I'm not a memory expert though).

> +
> +	/* FIX: need also mm->flags */
> +
> +	ret = cr_write_obj(ctx, &h, hh);
> +	if (ret < 0)
> +		goto out;
> +
> +	/* write the vma's */
> +	down_read(&mm->mmap_sem);
> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		if ((ret = cr_write_vma(ctx, vma)) < 0)
> +			break;
> +	}
> +	up_read(&mm->mmap_sem);
> +
> +	if (ret < 0)
> +		goto out;
> +
> +	ret = cr_write_mm_context(ctx, mm);
> +
> + out:
> +	mmput(mm);
> +	return ret;
> +}

[...]

> diff --git a/ckpt/restart.c b/ckpt/restart.c
> new file mode 100644
> index 0000000..9f52851
> --- /dev/null
> +++ b/ckpt/restart.c

[...]

> +/* read the task_struct into the current task */
> +static int cr_read_task_struct(struct cr_ctx *ctx)
> +{
> +	struct cr_hdr_task *hh = cr_hbuf_get(ctx, sizeof(*hh));
> +	struct task_struct *t = current;
> +	int ret;
> +
> +	ret = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_TASK);
> +	if (ret < 0)
> +		return ret;
> +
> +	/* for now, only restore t->comm */

+	/* current should already have correct pid and tgid */

> +	if (hh->task_comm_len < 0 || hh->task_comm_len > TASK_COMM_LEN)
> +		return -EINVAL;
> +
> +	memset(t->comm, 0, TASK_COMM_LEN);
> +	memcpy(t->comm, hh->comm, hh->task_comm_len);
> +
> +	cr_hbuf_put(ctx, sizeof(*hh));
> +	return 0;
> +}
> +

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes




----- End forwarded message -----

             reply	other threads:[~2008-07-30 16:52 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-30 16:52 Serge E. Hallyn [this message]
     [not found] ` <20080730165249.GA23802-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-30 17:40   ` [RFC][PATCH 2/2] CR: handle a single task with private memory maps Dave Hansen
2008-07-31 13:59     ` Louis Rilling
     [not found]       ` <20080731135910.GD22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 14:14         ` Serge E. Hallyn
  -- strict thread matches above, loose matches on Subject: below --
2008-07-30  3:27 Oren Laadan
     [not found] ` <Pine.LNX.4.64.0807292325290.9868-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2008-07-30  4:51   ` KOSAKI Motohiro
     [not found]     ` <20080730132257.9DF2.KOSAKI.MOTOHIRO-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2008-07-30 18:22       ` Oren Laadan
2008-07-30 20:58   ` Dave Hansen
2008-07-30 22:07   ` Serge E. Hallyn
     [not found]     ` <20080730220752.GA3518-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-30 22:20       ` Oren Laadan
     [not found]         ` <4890E930.9090204-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 13:57           ` Louis Rilling
     [not found]             ` <20080731135703.GC22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 15:09               ` Oren Laadan
     [not found]                 ` <4891D5C2.8090000-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 15:58                   ` Louis Rilling
     [not found]                     ` <20080731155856.GH22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 16:28                       ` Oren Laadan
     [not found]                         ` <4891E849.1050701-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 17:50                           ` Louis Rilling
     [not found]                             ` <20080731175058.GI22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 19:12                               ` Oren Laadan
     [not found]                                 ` <48920EA0.1060608-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-01 10:26                                   ` Louis Rilling
     [not found]                                     ` <20080801102600.GJ22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-08-01 14:15                                       ` Oren Laadan
     [not found]                                         ` <48931A7E.1040302-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-01 18:00                                           ` Louis Rilling
     [not found]                                             ` <20080801180038.GL22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-08-01 18:51                                               ` Oren Laadan
     [not found]                                                 ` <48935B4D.7070302-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-04 10:16                                                   ` Louis Rilling
2008-08-05  2:37                                                     ` Oren Laadan
     [not found]                                                       ` <4897BCE0.1080508-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-05  3:51                                                         ` Joseph Ruscio
     [not found]                                                           ` <1FA56146-7C30-4C36-982D-A50AA8BC8392-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-05  9:19                                                             ` Louis Rilling
2008-08-05 16:20                                                               ` Oren Laadan
     [not found]                                                                 ` <48987DE7.3060408-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-08-06 15:41                                                                   ` Joseph Ruscio
     [not found]                                                                     ` <3A99F254-E9B3-484B-85B0-29023ADA04C4-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-07  9:25                                                                       ` Louis Rilling
2008-08-05 16:23                                                             ` Dave Hansen
2008-08-06 16:15                                                               ` Joseph Ruscio
     [not found]                                                                 ` <FE4D936E-06F1-45D2-8E7C-85D87149BDC0-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-07  9:29                                                                   ` Louis Rilling
2008-08-08 17:20                                                               ` Joseph Ruscio
     [not found]                                                                 ` <03CE5BD3-E84A-4617-93BC-722ECB846C63-ccALPSaRSA5Wk0Htik3J/w@public.gmane.org>
2008-08-08 17:24                                                                   ` Dave Hansen
2008-08-05  9:32                                                         ` Louis Rilling
2008-07-31 21:25           ` Serge E. Hallyn
     [not found] ` <20080730161535.GB22403@hawkmoon.kerlabs.com>
     [not found]   ` <20080730161535.GB22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-30 18:27     ` Oren Laadan
     [not found]       ` <4890B2A8.8010808-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-07-31 14:08         ` Louis Rilling
     [not found]           ` <20080731140844.GE22403-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-07-31 14:44             ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080730165249.GA23802@us.ibm.com \
    --to=serue-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
    --cc=Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.